xavierdidelot / BactDating

Bayesian inference of ancestral dates on bacterial phylogenetic trees
https://xavierdidelot.github.io/BactDating
MIT License
81 stars 15 forks source link

Question about random permutation #15

Closed flashton2003 closed 5 years ago

flashton2003 commented 5 years ago

Dear Xavier,

I have a question about the random permutation test done by roottotip. It seems that in Ramsden et al (the Hantavirus paper you cite in the BactDating manuscript), they carried out a full BEAST analysis with the randomised tip dates, and this seems to be the standard in the field. I'm guessing that roottotip only does the permutation for the root-to-tip analysis, rather than the full bactdating analysis. Is that correct?

In your opinion, if roottotip gives a significant p-value, but the signal is a bit iffy (excuse the technical language, see the results below) would there be any value in randomising and carrying out the full bactdating analysis?

Many thanks,

Phil

Screenshot 2019-09-03 at 16 32 22

xavierdidelot commented 5 years ago

Hi Phil,

Yes you are correct that this test is based on a permutation of the root-to-tip analysis. I thought this is what was done in Ramsden et al, but actually you're right they did a permutation on the BEAST analysis. This is of course also feasible in BactDating, as mentioned in the discussion of the paper (right at the end of page 9).

The p-value obtained from permuting the root-to-tip analysis is useful to get a quick indication of significance, but it can't fully be trusted since it is based on a regression on points that are not independent.

Another option is to compare two runs of BactDating with correct dates versus with all dates set equal, and this is implemented in BactDating using DIC model comparison, as described in the paper. This has the advantage to only requires to run BactDating one more time, but DIC is known to have limitations.

So if you want to really do the best you can in terms of testing the strength of the temporal signal in your dataset it would indeed be a good idea to do permutation test of sampling dates on the BactDating analysis. In your case 100 duplications would probably be quick and enough and I would expect you to find that the estimated clock rate is higher with the correct dataset than with any of the permutations.

Best wishes, Xavier

flashton2003 commented 5 years ago

Thanks Xavier!