xavierdidelot / TransPhylo

Reconstruction of transmission trees using genomic data
http://xavierdidelot.github.io/TransPhylo/
GNU General Public License v2.0
60 stars 22 forks source link

Consensus and medoid trees on mult_ttree #15

Closed GaloGS closed 3 years ago

GaloGS commented 3 years ago

Hi,

I am using TransPhylo to infer transmission trees starting from transmission clusters ofM. tubeculosis defined by a given number of SNPs. TransPhylo is an awesome piece of software and so far I had no troubles going through the tutorials using my own data. I have, however, a conceptual question as I am not sure if I am using TransPhylo correctly.

I would like to take into account as much phylogenetic uncertainty as possible, so I am using the new function infer_multittree_share_param to infer transmission trees for a given cluster, starting from many trees that are subsampled from the BEAST posterior as in (Xu et al. 2020). Now for each input phylogeny I obtain as many transmission trees as MCMC iterations, and following the mentioned paper, I could just take the MAP transmission tree for downstream analysis. My question is whether instead of the MAP, it would be correct to merge all transmission trees (given that they come from different posterior phylogenies of the same transmission cluster) and calculate the consensus/medoid transmission tree. Previous to the merge, I "manually" do the burn-in of the transmission trees for each phylogeny. This is what I have done so far and at least it works!

Also, I did not manage to find in the documentation what is the difference between the consensus transmission tree obtained with consTTree and the medoid obtained with medTTree, so I cannot figure out which one most likely represents the true transmission links.

Any help or suggestions on this topic are very much welcomed,

Thank you very much,

Galo A. Goig

xavierdidelot commented 3 years ago

Hi Galo,

Thanks for your message, I'm glad TransPhylo is proving useful for your research.

I'm afraid the way you are using infer_multittree_share_param is not correct. This is not intended to input uncertainty in the phylogenetic trees, but instead it is useful to analyse separate phylogenetic trees (eg from separate outbreaks) that are believe to share the same epidemiological parameters. If you apply this to a sample from BEAST as you described, you will get misleadingly precise estimates of the parameters, since TransPhylo will believe that each input provides separate independent information. Instead, you should use the inferTTree command on each phylogenetic tree in the BEAST sample separately, and you can then merge the outputs.

consTTree is an older method for building a consensus tree, and medTTree is newer and should be preferred unless you have good reasons to want to use the previous method. The difference is that consTTree tries to build a transmission tree that reflects the transmission events found in as many of the sampled transmission trees as possible, but in the process can end up with a transmission tree that is actually impossible. This is similar for example to the way that BEAST consensus trees can end up with impossible negative branch lengths. Instead the medTTree function will return a transmission tree that was sampled and is the most representative of all sampled transmission trees.

Best wishes, Xavier

GaloGS commented 3 years ago

Dear Xavier,

Thanks a lot for your quick response. This is very helpful. I was concerned exactly about that problem. Then I will try to merge the outputs as you mentioned.

Best wishes, Galo