stevenheritage / MBASR

MBASR -- MrBayes Ancestral States with R
Creative Commons Zero v1.0 Universal
6 stars 0 forks source link

Number of columns in output greater than number of states. #9

Closed padpadpadpad closed 2 years ago

padpadpadpad commented 2 years ago

Hi Steven

Firstly, this looks like a great tool! I was looking at using BayesTraits but it does not seem intuitive to get ancestral states from all nodes in the tree. Then I found MBASR!

I have only done a quick run through of my tree and traits but my first output contains one more column than I would expect. I have 7 states and somehow have 8 columns after the node column. Not completely sure what to do about this? I have checked the number of unique states in the trait data and it is only 7.

Any thoughts?

I have uploaded the files used (in the Archive.zip) and the output file in case they help.

MrBayes.ASR.results.txt Archive.zip

stevenheritage commented 2 years ago

Hi Daniel,

Thanks for your message. It's a simple fix. All trait data must start numbering from 0. So your data, which has seven states, should be coded 0-6 rather than 1-7. MrBayes' Markov model always assumes that the number of states is the highest coded number +1, which means in your first run (with states codes 1-7) the reconstruction was based on eight states. This extra state with no data surely skews the results. Check out the file named "read.me.pdf" in the MBASR zip that you downloaded (also on page 3 of the MBASR manuscript) for this tip regarding coding trait states.

I have used a find&replace feature to re-code your data. Please find it attached.

Good luck with your research and feel free to send more notes if you encounter hiccups.

All my best, Steven trait_data_recoded.zip

padpadpadpad commented 2 years ago

Hi Steven

This worked thank you! I am also trying some Mk methods in R such as ape::ace() and in their methods you can choose how to change the transition matrix (such as equal rates, symmetric rates, or all rates different). The output from MBASR is identical to the equal rates data, but differs to the all rates different model.

How do you choose which model is best and is it possible to change that in the MBASR approach?

Many thanks Dan

stevenheritage commented 2 years ago

Hi Dan,

Great to hear that it worked for you.

If you are comfortable getting trees and trait data in-and-out of R, and using functions like ace, then you are already more advanced than the target user for MBASR. I read that ace's ML discrete trait reconstructions can (?possibly) return an AIC value as a measure of fit, and (to answer your question) this is the type of diagnostic you would use to choose which rate matrix you might prefer. Though, the AIC itself is inappropriate so I would recommend switching to the R package called mvMORPH, which offers many additional settings that you might want to explore (including different regimes in different parts of the tree)... and most importantly, returns an AICc score, which IS the appropriate decision diagnostic. Lower AICc values derive from better fitting models and these values include a penalty for the number of parameters invoked and make necessary adjustments for sample sizes. If you are happy with a ML reconstruction, then I think that is your best option. That said, philosophically, adjusting a model by the frequency of different states in your trait dataset might be unjustified despite an improved AICc, given that you can strongly bias how frequent a state is observed simply by which taxa you elect to include in your tree. All of this is outside the scope of MBASR. Alternatively, BayesTraits has the local scaling option, which will almost certainly improve the fit of your trait-data to your tree... and would perhaps outperform all of the options in available R packages, but I think you stated that BayesTraits was more than you wanted to get into.

You could hack MBASR and set different distributions for the MrBayes statefreqpr, or even gamma distribute the Mk model... but the proper assessment of those results should be done by comparing stepping-stone derived marginal likelihood values and calculations of Bayes Factors. That would be a really cool approach, but it's not a simple process (the objective of MBASR).

Good luck going forward. Sounds like you have a cool, and well considered, project on your hands.

Cheers, Steven

padpadpadpad commented 2 years ago

Hi Steven

Many thanks for your detailed response. It has been super useful and will give me somewhere to start on with BayesTraits and other packages. I will close this Issue now as it has become more about comparative phylogenetics more generally!

I will end with a couple of other comments based on your response:

  1. ape::ace() does indeed return an AIC value and I will look into mvMORPH as I agree AICc is more appropriate.
  2. I do have big differences in the frequency of each trait state in my tree. Some only occur 20 times and I have a max of ~1000. The best approach to dealing with this and rate changes between states remains unclear to me at the moment. But it is all a learning experience from here!
  3. I will work on using BayesTraits and getting ancestral state reconstructions for all nodes.
  4. In the end I want to fit a MuHiSSE model to try look at rates of speciation within states and transition rates between states. I think this approach will also allow for ancestral state reconstruction.
  5. If you fancied trying to hack MBASR a bit at any point I would be happy to help be a guinea pig.

Finally just again want to say thank you for giving really lovely, detailed, thoughtful responses to my queries. Your scripts were really easy to use and I even watched your YouTube video to help me compile MrBayes for MacOS.

Cheers Dan