xavierdidelot / BactDating

Bayesian inference of ancestral dates on bacterial phylogenetic trees
https://xavierdidelot.github.io/BactDating
MIT License
80 stars 15 forks source link

Rooted and unrooted trees in BactDating analysis! #23

Closed nbawe closed 4 years ago

nbawe commented 4 years ago

Hello,

I have question regarding rooted and unrooted trees in BactDating analysis!

1) I have extracted core genome and followed RAxML tree calculations.

Question1: Do I need root (midpoint, best fitting root) RAxML best tree before running ClonalFrameML?

2) Then I am using ClonalFrameML as input in BactDating analysis .

Question2: Should the BactDating analysis use rooted (created when using it root to tip analysis) or unrooted (eg original RAxML tree) tree as an input?

Question3: Should rooting the tree use useRec=T argument when using CFML output?

3) I am using sampling years with unknown dates.

Question4: Am I correct that Root to tip is preformed on simple years (one value e.g. 2012, 2013 etc) but BactDating is preformed with interval values (year, year+1)?

4) Iam using two models: one with interval dates and fixed mu and updateMu false with useRec true and other exact same but with forced dates equal.

Question5: When comparing the dic values of two models (one with interval dates and other with forced equal dates) and if outcome is that dic of equal dates is better (smaller) even if effective size was better for first one and root to tip significant then there is no significant temporal signal? And therefore rendering temporal analysis not meaningful?

Thank you

xavierdidelot commented 4 years ago
  1. No, ClonalFrameML does not require the tree to be rooted. If the tree is rooted, the output will be the same as if the tree was not rooted and as if the root was placed anywhere else.

  2. The output tree from ClonalFrameML will not be rooted, and BactDating does not need to input tree to be rooted. The output from BactDating will be rooted as best as possible to capture the temporal signal.

  3. Yes this is correct, you can use intervals.

  4. Yes if you get a smaller DIC when the dates are all equal to each other then it means that the temporal signal is not significant, at least for the value of the rate that you forced to (you could let this free to be estimated to test the significance of any rate)

nbawe commented 4 years ago

Thank you! I let mu to be estimated and run proactively still dic is smaller in forced dates. 1) Could longer runs 10e8 help? 2) What do you think perhaps a smaller subset could help?

xavierdidelot commented 4 years ago

No a longer run probably will not help. What is the inferred value of mu when you don't force it, is it plausible? And is the dating plausible?

nbawe commented 4 years ago

Inferred mu is about 15 and forced 32.6 (Wilson et al for C. jejuni). What do you mean dating plausible for root it is 1870-1974 so I really don’t know, but cluster nodes near tips seems plausible.

When rooting tree for roottotip how to use dates - intervals is not allowed? At least I get error while using intervals.

nbawe commented 4 years ago

I think I will extract smaller subset of isolates and run core extraction and CFML+BactDating again.

xavierdidelot commented 4 years ago

mu=15 sounds about right for jejuni I think, so it suggests that BactDating has correctly detected the temporal signal. If the temporal signal was weak, you would usually get much smaller values of mu than that. So I think all is working, well done!

I'm not sure why the DIC is smaller when all dates are forced equal, maybe it's because of the uncertainty on some of your dates, I have not tested this DIC test in these conditions.

An alternative test would be to do a permutation of the dates for all isolates, and see if the inferred mu is always smaller than the one you have with the correct dates. If so, it means that the signal is definitely significant.

nbawe commented 4 years ago

By permutation test do you mean randomly assigning dates (in the dataset) on tips and then rerun BactDating and then again multiple times?

xavierdidelot commented 4 years ago

Yes that's exactly it, randomly without replacement.

nbawe commented 4 years ago

Without replacement?

nbawe commented 4 years ago

Should the dates originate same dataset or completely random?

xavierdidelot commented 4 years ago

The dates are the same as in the dataset, but you apply a permutation without replacement of the dates for each individuals. You can do this using the command sample(dates,replace=F)