nextstrain / augur

Pipeline components for real-time phylodynamic analysis
https://docs.nextstrain.org/projects/augur/
GNU Affero General Public License v3.0
268 stars 129 forks source link

Compute branch lengths from a (BEAST) tree #964

Closed babarlelephant closed 2 years ago

babarlelephant commented 2 years ago

Hi, long question on how would you compute the divergence tree/branch lengths from a (BEAST) timetree?

I found a way that works for MonkeyPox but I am not sure if it will stay correct with other datasets.

The input is a BEAST maximum clade credibility timetree

called fromBeast.tree.mcc, and the alignment used to make it.

Timetrees are hard to read alone, so I also want to have the divergence tree and the mutations appearing on the branches, as in any nextstrain tree.

Solution with iqtree2 -te

Which seems to work reasonably well, at least on this particular case.

augur import beast --mcc fromBeast.tree.mcc --output-tree beastTree.nwk --output-node-data nodes.json --most-recent-tip-date 2022.3958904109588
iqtree2 -s alignment.fasta -te beastTree.nwk -m HKY -pre treeWithBranchLengths -o MPXV-UK_P2_United_Kingdom_2018-09-18 --redo
augur ancestral --alignment alignment.fasta --tree treeWithBranchLengths.treefile --output-node-data muts.json
augur export v2 --tree beastTree.nwk --node-data nodes.json muts.json --output MonkeypoxBeast.json 
python writeDivToAuspiceJson.py MonkeypoxBeast.json 

beastTree.nwk is a labeled newick, it has some NODE_0000001 attributes in internal nodes, iqtree2 -te optimizes the branch lengths and its output still contains these node names, though a few nodes are removed, augur ancestral assigns ancestral sequences and mutations to these labeled internal nodes, so they can be added to the auspice json by augur export, Finally writeDivToAuspiceJson.py counts the mutations in the branches of the auspice json and writes it to ["node_attrs"]["div"]

image

Calling TreeTime directly

Otherwise I tried to call treetime directly, feeding the dates parameter with the internal nodes' dates exported by augur import beast

tt = TreeTime(tree=tree, aln=aln, dates=dates)
tt.infer_ancestral_sequences()
tt.optimize_branch_len()

but the result is surprising image

Calling augur ancestral directly on the timetree exported by augur import beast

The first thing I tried, but it failed (the obtained divergence tree starts with two long branches, which are wrong, but when clicking on the subtrees it looks as expected)

image

Thanks.

jameshadfield commented 2 years ago

Calling augur ancestral on the BEAST tree should work with the following caveats. Firstly, the internal nodes will have to be named on the input newick (either via script, or running augur refine without asking it to compute a time tree). Secondly, the tree will be constrained to the BEAST tree, so your not inferring a divergence tree per se; you will need to write a small script to convert the output from augur ancestral into a count of mutations which augur export v2 will assign to div if you want to see it in auspice (see the node-data JSON output of a typical augur refine command to check the expected format). Finally, given a tree as above with long branches from the root, the way mutations are assigned to these two long branches is a bit strange; using an outgroup to root the tree and then pruning it out should help here.

I thought BEAST had a model which could export nucleotide mutations on each branch, but it's been a while. You may want to try asking on a BEAST forum about this?

If you want to persue this approach using augur, could you transfer this to discussion.nextstrain.org & hopefully others can comment as well 😄

jameshadfield commented 2 years ago

Closing this issue, but please feel free to continue on discussion.nextstrain.org!