Closed BenjaminBlanchard closed 4 years ago
I'll look into this. The MEDUSA code is in between versions; updating should (hopefully) fix things.
Great - thanks! Any estimate on when the new version will be available?
@BenjaminBlanchard I apologize for the difficulty; entirely my fault. Unfortunately I am travelling at the moment. I definitely need to get the updates implemented. I'll update as I am able.
@josephwb Just wondering, any rough guess on when the update will be completed? I have the most recent version, so if the update has already been implemented, then I'm still having the same problem!
@BenjaminBlanchard: apologies for the delay. I am currently overhauling the code.
Note that the NA
s that you report are because a yule model was fit to that partition (yule does not have extinction, so epsilon is not estimated). You can force the model to be birth-death, if you like.
In terms of the crashing, it has something to do with the configuration of branch lengths and the diversities such that the optimizer gets stuck. It seems that the estimates of extinction (epsilon) are so small that trying to calculate confidence intervals runs into numerical precision problems. This may be solved by putting a reasonable lower bound on epsilon.
In the meantime, if it is of use I can skip the confidence interval calculations (if they are not of interest to you) and get you just the raw results. Is this something you'd want?
@josephwb I very much appreciate the response, and the description of what's going on - very helpful!
Actually, yes, having the raw results without confidence intervals would still be useful if it's not too much trouble! Thank you very much!
@BenjaminBlanchard I've been pulling my hair out trying to figure out what is going on with your data. This morning I by accident opened your richness file, and find that 38 tips have richnesses of zero! MEDUSA does not have a check for this, so these zero-diversities go through to analysis and break everything.
To fix things, do:
richness <- richness[-which(richness[,2] == 0),];
and then run your analysis as normal. I've confirmed that this works for your data with bd
, yule
, and mixed
models.
Apologies that the richnesses
instructions are not clear. I will update the doc, and implement a check so that no one has to go through what you've gone through.
@josephwb Wow, first, I very much appreciate you spending time on this, and second, I apologize for such a simple mistake! I had thought that the richness file was supposed to have the number of additional taxa not included, but I assume, then, that it's actually supposed to be the total diversity, including the tip on the tree?
Again, many, many thanks for working through this, and sorry again for the mistake!
@josephwb Actually I see now that it's really clear that "if a tip does not represent a higher level taxon, its richness should be 1". I don't know why I missed that...
Actually...... I had sent you the wrong richness file before, and have emailed you the correct one (with no 0's in the file), because I am still getting the same problem even with no 0's, unfortunately.
@BenjaminBlanchard Yes, the richnesses in the richness
file should represent total diversity.
I've run the richness file you sent me, and everything seems to work fine. Maybe there is junk in your workspace? Maybe try quitting and redoing things.
Nevermind, I can recreate the error.
@BenjaminBlanchard Please try this:
install.packages("devtools");
library(devtools);
install_github("josephwb/turboMEDUSA/MEDUSA", dependencies=TRUE);
This will install a different version of MEDUSA (the version that will replace the code in the geiger
repo). Then run as:
require(MEDUSA);
richness <- read.csv("Richness_Table.csv", header=TRUE);
tree <- read.nexus("Tree_WorkersOnly.nex");
medusa_ants.default <- MEDUSA(tree, richness=richness);
Does that work?
@BenjaminBlanchard This is what I get:
> medusa_ants.default <- MEDUSA(tree, richness=richness);
Using AIC-threshold as analysis-terminating criterion.
Appropriate threshold for tree of 270 tips is: 7.002076.
Preparing data for analysis:
Gathering descendant node information... done.
Gathering tip richness information... done.
done.
Optimizing parameters for pendant edges... done.
Pre-calculating parameters for internal nodes... done.
Step 1: lnLik=-2290.987; AICc=4585.997; model=bd
Step 2: lnLik=-2263.917; AICc=4537.947; shift at node 328; model=bd; cut=stem; # shifts=1
Step 3: lnLik=-2246.463; AICc=4509.197; shift at node 379; model=bd; cut=stem; # shifts=2
Step 4: lnLik=-2224.974; AICc=4472.45; shift at node 478; model=bd; cut=stem; # shifts=3
Step 5: lnLik=-2211.969; AICc=4452.74; shift at node 441; model=bd; cut=node; # shifts=4
Step 6: lnLik=-2201.492; AICc=4436.027; shift at node 425; model=yule; cut=node; # shifts=5
Step 7: lnLik=-2191.156; AICc=4419.627; shift at node 321; model=yule; cut=node; # shifts=6
Step 8: lnLik=-2181.267; AICc=4404.157; shift at node 401; model=yule; cut=node; # shifts=7
Step 9: lnLik=-2171.455; AICc=4388.872; shift at node 49; model=yule; cut=stem; # shifts=8
Step 10: lnLik=-2162.385; AICc=4375.105; shift at node 485; model=yule; cut=node; # shifts=9
Step 11: lnLik=-2153.526; AICc=4361.795; shift at node 324; model=yule; cut=node; # shifts=10
Step 12: lnLik=-2144.933; AICc=4349.05; shift at node 291; model=yule; cut=stem; # shifts=11
Step 13: lnLik=-2137.135; AICc=4337.931; shift at node 66; model=yule; cut=stem; # shifts=12
Step 14: lnLik=-2129.239; AICc=4326.651; shift at node 155; model=yule; cut=stem; # shifts=13
Step 15: lnLik=-2121.715; AICc=4316.152; shift at node 384; model=yule; cut=node; # shifts=14
Step 16: lnLik=-2114.24; AICc=4305.787; shift at node 427; model=yule; cut=stem; # shifts=15
Step 17: lnLik=-2106.773; AICc=4295.474; shift at node 3; model=yule; cut=stem; # shifts=16
Step 18: lnLik=-2099.498; AICc=4285.582; shift at node 353; model=yule; cut=node; # shifts=17
Step 19: lnLik=-2092.787; AICc=4276.857; shift at node 512; model=yule; cut=stem; # shifts=18
Step 20: lnLik=-2086.579; AICc=4269.174; shift at node 373; model=yule; cut=stem; # shifts=19
No significant increase in aicc score. Disregarding subsequent piecewise models.
Summary of optimal MEDUSA model:
Model.ID Shift.Node Cut.At Model Ln.Lik.part r epsilon
1 1 271 node bd -501.183 0.02828970 0.937018
2 2 328 stem bd -936.8519 0.05992600 0.746654
3 3 379 stem bd -92.70311 0.10316900 0.890933
4 4 478 stem bd -48.76239 0.06117870 0.978703
5 5 441 node bd -22.60155 0.00174674 0.999623
6 6 425 node yule -13.31896 0.16729600 NA
7 7 321 node yule -10.99961 0.22944800 NA
8 8 401 node yule -5.960211 0.00701145 NA
9 9 49 stem yule -6.725217 0.16092200 NA
10 10 485 node yule -10.06846 0.27070100 NA
11 11 324 node yule -6.137248 0.00587383 NA
12 12 291 stem yule -228.4913 0.07673940 NA
13 13 66 stem yule -6.91215 0.16631500 NA
14 14 155 stem yule -7.327043 0.14682600 NA
15 15 384 node yule 0 0.00000000 NA
16 16 427 stem yule -99.52476 0.06852910 NA
17 17 3 stem yule 0 0.00000000 NA
18 18 353 node yule -8.198065 0.01724200 NA
19 19 512 stem yule -62.88513 0.07235690 NA
20 20 373 stem yule -17.92887 0.21354500 NA
Huzzah!!! It works! Thank you so much, I really appreciate it a lot!
@josephwb Oh no.... now I'm having issues with plotting...
plot(medusa_ants.default)
Yields:
Error in plot.window(...) : need finite 'xlim' values In addition: Warning messages: 1: In min(x) : no non-missing arguments to min; returning Inf 2: In max(x) : no non-missing arguments to max; returning -Inf 3: In min(x) : no non-missing arguments to min; returning Inf 4: In max(x) : no non-missing arguments to max; returning -Inf
@BenjaminBlanchard Unfortunately all of the commands are off just enough to be thoroughly annoying. :scream:
Try this:
summ <- medusaSummary(medusa_ants.default);
While medusaSummary
plots the results, you may want to pass the results stored in "summ" above to the function plotPrettyTree
for more flexible plotting. To see the options of this function, do:
?plotPrettyTree
For more commands, you can do:
?MEDUSA
Here is what the straight-up medusaSummary
gives:
@josephwb Yet again, thank you very much! I'm glad it was such a quick fix for this one, ha.
Hi,
Hoping to get an answer from either of you, so I pasted my question here too. :P
I'm preparing a table for species richness, but I have a little difficulty in understanding "All taxa missing from the tree have to be assigned to one of the tips in the richness matrix." in the manual.
For example, one clade on my tree is A family, which including 10 genera, but I only sampled species from 8 genera, but how could I prepare a table reflecting two missing genera?
I know I need to record: Family A genera1 5 species Family A ... ... Family A genra8 6 species
So I want to know how to record richness of genera9, genera 10 in the richness list, and I dont have any of samples from genera9 and genera 10 in the tree. I dont know the relationship of genera 9 or genera 10 between the tips on the tree, how could I "assign genera9 or genera 10 to one of the tips in the richness matrix"?
Thanks!
One simple way is to assign richness of the entire family to a single tip. Unfortunately you lose a lot of information that way (i.e. edges), but it is an honest presentation of the data.
Have these genera never been studied before? If they have, and you can make assumptions of monophyly, you can record their richness in with the closest sampled genus.
Failing that, you are kinda stuck, I think. Ideally if one wants to run this type of analysis then sampling should be performed accordingly. If you want to really make assumptions you can add the missing genera as tips in a polytomy with the other genera (MEDUSA will randomly resolve the polytomy).
Finally, if you are willing to assume that the missing taxa are randomly missing, you can run the analysis with only the sampled taxa you have. I'm not sure if I would believe the results that come out of this, but some people do it (because they have little other choice).
I hope this helps, and am sorry I cannot be more constructive! :bowtie:
@josephwb Thanks for your quick responding! I have tried so hard that expanding my sampling to include more species, so it's hard choice to shrink the whole phylogeny into a family tree in order to save the richness information for those missing samples genera. Those genera not sampled for several reasons: 1, no molecular data (no data for my sampled genes); 2, maybe never studied; 3, only exist in literature or plant list. I don't think I have a solid reference to add any of those miss genera to the phylogeny tree resolved from actually molecular data, even in a polytomy matter (the branch length for those missing genera could be biased, or misleading, right?).
Thanks for those options, I think I may run it with all the sampled I have (pretend it's complete sampling on genus level), see how the results look like.
@josephwb I think the MEDUSA used in Tank et al., 2105 (Nested radiations and the pulse of angiosperm diversification: increased diversification rates often follow whole genome duplications) is the same version here, right? I think the genera under each family in the richness list are not complete all of genera recorded? What do you think?
Yup, same version.
@josephwb One more question :) So if I compress my phylogeny (species level) into genus level, each genus has to be monophyletic? What's your suggestion if some genera are not monophyletic? How to prepare the species richness table?
Thank you very much!
You will need top focus on monophyletic lineages rather than genera (if the latter are not monophyletic). This will mean some of your tips will represent genera, some parts of genera, and some multiple genera. This can be quite tedious if your tree/taxonomy is large.
If you send me your tree & taxonomy I can try to lump things myself to see how differently we do things. I might even be able to come up with a general function that can do this for you (no promises!).
@josephwb Ok, great! Thanks! I'm trying to finalize the topology, however, I have a phylogeny with around 20,000 species, so as you mentioned above, the perfect monophyly is hard to reach for all the lineages when you had a little bit denser sampling. I'll bother you again when I have my tree and richness table ready. Thank you so much Joseph!!!
I'm attempting to run a MEDUSA analysis in geiger - the analysis initially runs along fine, and estimates the number of rate shifts. But then when it comes to the end (following no more significant increases in the aicc score), it produces this error:
Error in uniroot(thresholdR, lower = low.bound, upper = par1) : f.lower = f(lower) is NA
After looking at the package code, it seems that there's something strange going on with arriving at a confidence interval for parameter "r", but I have no idea what. Also, the issue goes away when I reduce the species number in the richness file, although there are still several NA's in the parameter estimation output. Any idea what might be causing this issue?