thej022214 / hisse

Hidden State Speciation and Extinction
6 stars 7 forks source link

GeoHiSSE Models? #22

Closed dylanHco closed 3 years ago

dylanHco commented 3 years ago

Hello,

I am having a hard time understanding how to code for each GeoHiSSE model. I have read thru each tutorial a few times, and both of the papers. Would it be possible to just get a cheat sheet with each model under its respective category?

Many thanks, Dylan

dylanHco commented 3 years ago

My apologies, forgot to check the Fig share. Is there any reason I should not use those associated with the original paper?

Caetanods commented 3 years ago

Hello!

GeoHiSSE models are prepared using the parameters 'turnover' and 'eps' on the 'hisse::GeoHiSSE' function which control (speciation + extinction) and (extinction/speciation), respectively. 'turnover' takes a multiple of 3 values (two endemic ranges and one widespread) and 'eps' takes a multiple of 2 values (one for each endemic range; widespread does not have an associated extinction rate). If you have a hidden state you need to feed longer vectors. The help page explains how to do this and if you check the Supplementary Material for our Evolution paper the examples can help. I saw you found it, just linking here for other people to see (https://doi.org/10.6084/m9.figshare.6146768.v1).

About which models you should test: You should include models that are associated with your hypotheses, which usually are scenarios in which ranges influence the diversification of the group. For each of these models, it is important to have an area-independent model with a comparable number of hidden states ( i.e., diversification parameters). This is the key and fundamental reason why hidden-state SSE models work.

"Is there any reason I should not use those associated with the original paper?" No. You can absolutely use them. Table 1 of our paper lists 18 models. You can include all of them in your analysis. You can also include other, more specific models, in your study because they might better capture your biological questions and might not be one of these 18 models. You can also exclude some models because, for one reason or another, you are sure that the particular scenario does not make sense for your study system (given that you ALWAYS include an alternative model with a comparable number of hidden states for each model in your set).

In summary, the family of hidden-state models (HiSSE, MuHiSSE, GeoHiSSE, and CorHMM) are very flexible approaches that can be used to implement the models that best describe the hypothesis you are interested in. For all of them, however, it is fundamental to make sure the complexity of the "null" models and the complexity of the "alternative (with an effect)" models match.

dylanHco commented 3 years ago

Hi Daniel,

Thank you for your detailed explanation, I think I am starting to get a handle on it. Although I am still a little confused with terms.

I am confused about "free parameters" and the categories from table 1 in the 2018 paper. Everything has free parameters, but the caption says they are only free with then the model is "Full"? Here are some that i tried to code myself, could you please check to see if i did this correct? The ones with * are the ones I am most confused about.

Cladogenetic models

Model 1: CID - Original GeoSSE (4 free parameters) would be : turnover (1,1,0) and eps would be (1,1)

Model 2: Original GeoSSE, full model, would it be: turnover (1,2,3) and eps (1,1)? but the table says it should have "7 free parameters"? What are the extra two parameters?

Model 3: CID - GeoHiSSE, three hidden rates , null model: turnover (1,1,0,2,2,0,3,3,0) and eps (1,1,1,1,1,1) 9 free parameters

Cladogenetic+extinction

*Model 7: CID-GeoSSE+extirpation (I am sort of lost right here too; how to add in extirpation) would it be: turnover (1,1,1) and eps (1,1)?

*Model 8: GeoSSE+extirpation full model How would this differ from Original GeoSSE full model ? would it be: turnover (1,2,3) and eps (1,2)?

*Model 9: CID-GeoHiSSE+extirpation three hidden rates, null model: turnover (1,1,0,2,2,0,3,3,0) and eps (1,1,2,2,3,3)?

Anagenetic (are all of these just MuSSE-like models?)

*Model 13: Anagenetic GeoSSE (I am sort of lost right here too; how to make anagenetic) would this be: turnover (1,2,0) and eps (1,1)? but also trans.rate.mod (2,3)?

*Model 14: Anagenetic GeoSSE, full model: not sure right here too ???????? turnover (1,2,0) and eps (1,2)? but also trans.rate.mod (2,3)?

*Model 15 CID anagenetic GeoHiSSE, three hidden rates, null model: turnover (1,1,0,2,2,0,3,3,0) eps (1,1,2,2,3,3) and trans.rate.mod (2,3)?

Thank you for your time and feedback. Best, Dylan

Caetanods commented 3 years ago

Hello Dylan,

All your questions are about understanding the models. These questions are not about issues with the R package or issues with the method. Just want to make it clear for everyone else visiting this post.

About Table 1: The caption just states that "Full model" means that all parameters are free. Other models have a varying number of free parameters. I included this statement as a generalization to help read the table. Sorry that it is confusing. The "Full model" lets all parameters vary freely, from that we insert restrictions to create other models. The number of free parameters column is meant as a guide to evaluate model complexity---more free parameters mean higher model complexity. Note that parameter count includes transitions (i.e., dispersion rates) and not only diversification parameters.

About Model 2 (GeoSSE): We have 2 endemic areas, one widespread area. Lineages can speciate, get extinct, and also disperse. Then we have 3 speciation rates (2 for endemics and one cladogenetic event when widespread). Two extinction rates (one for each endemic). Two dispersion rates, the movement from endemic to widespread. Your count for Model 2 did not include the dispersion rates. Try to draw a figure representing the GeoSSE model. Then expand this figure to create a GeoSSE model with one hidden state (our Figure 3 can help). This exercise will help you find the parameters that you are missing and also understand how to implement (and interpret) your biological hypothesis in GeoHiSSE.

About model configurations: Check this RMarkdown document (also part of the paper online Supplement): https://figshare.com/articles/dataset/Simulation_study_datasets_code_and_results/6146645?file=11099504 . Starting on line 49, I implement each of the models (from 1 to 18, in sequence). Compare your models with these ones and check the differences on the help page for the functions. It seems to me your difficulty is because you are not taking into account that every GeoHiSSE model has a diversification parameters configuration (which you control using the GeoHiSSE function) AND a transitions configuration (including dispersion; which you control using the TransMatMakerGeoHiSSE function). In other words, you can only describe a model in the GeoHiSSE framework if you use TransMatMakerGeoHiSSE and pass the transition matrix to the GeoHiSSE function. Please read our Vignette once more with this in mind. Check how every model first defines the transition matrix and then call the GeoHiSSE function.

About extirpation and anagenetic models: Please refer to the "Further Model Expansions" section of our paper where we describe what these models are. You can get a copy of the article here: https://caetanods.weebly.com/publications.html

I am closing this issue. However, you can contact me via email: https://caetanods.weebly.com/contact.html