Open ammaraziz opened 2 years ago
@huddlej Is there anything I can help with triaging this bug? I can dig deeper into the code to see where it fails if it helps.
@ammaraziz Could you share a example input tree, i.e. what you're passing for the --tree
option to augur refine
or augur ancestral
?
Coming back to this .... 2 years later. My apologies for dropping the ball on replying. This issue still exists, and can cause very strange nextstrain visualisation:
I'll generate a reproducible example.
Hi @tsibley here is a repex:
augur align --sequence sequences.fasta --output aligned.fasta --reference-name "NC_007362.1" --nthreads 10
augur tree --alignment aligned.fasta --output tree.ufboot.nwk --nthreads 10 --override-default-args --tree-builder-args="-B 1000 -alrt 1000 -bnni" --nthreads 10
augur tree --alignment aligned.fasta --output tree.boot.nwk --nthreads 10 --override-default-args --tree-builder-args="-b 100" --nthreads 10
augur refine --tree tree.boot.nwk --alignment aligned.fasta --metadata metadata.csv --output-tree tree.boot.refined.nwk --output-node-data nodedata.boot.json --keep-root
augur refine --tree tree.ufboot.nwk --alignment aligned.fasta --metadata metadata.csv --output-tree tree.ufboot.refined.nwk --output-node-data nodedata.ufboot.json --keep-root
Inputs:
sequences.fasta
Outputs of interest:
Comparing the nodedata outputs. With traditional bootstrapping (correct parsing):
"nodes": {
"NC_007362.1": {
"branch_length": 0.04489627
},
"NODE_0000000": {
"branch_length": 0.001
},
...
With ufboot (incorrect parsing, note the node name is the bootstrap value):
"nodes": {
"0/15": {
"branch_length": 1e-06
},
"0/29": {
"branch_length": 1e-06
},
...
Digging into the code of refine.py
is this chunk of code which reads the attributes into a dict:
https://github.com/nextstrain/augur/blob/da1c89d4b3232aa977b69fb8df33d666532c9a56/augur/refine.py#L88-L96
The T
object is created by read_tree
function in utils.py
which it's a wrapper around biopython:
https://github.com/nextstrain/augur/blob/master/augur/utils.py#L68
Printing the Tree
object from the biopython function identifies the issue - biopython is not correctly parsing nhx trees:
Tree(rooted=False, weight=1.0)
Clade(branch_length=0.0)
Clade(branch_length=0.04490861, name='NC_007362.1')
Clade(branch_length=0.05721486, name='100/100')
Clade(branch_length=0.02063323, name='100/100')
...
My conclusion is this is a biopython. Checking biopython github issues, I find this from 2017: https://github.com/biopython/biopython/issues/1464
I think the best thing to add documentation that ufboot
is not supported in augur refine
.
I'll try to get biopython to fix this issue.
Current Behavior
Augur version 14 introduced
--tree-builder-args
which allows the user to create a tree with bootstrap values. IQTree has two methods bootstrapping trees, traditional bootstrap and UltraFast Bootstrap (UFBoot). UFBoot is recommended to be run with the SH-aLRT test, in practice these two are nearly always run together. IQTree will annotate the tree withUFBoot
/SH-aLRT
:branch length
, eg:75.6/72:0.00056165
.This causes two issues:
refine
and not annotated in thebranch_lengths.json
Number 1 is bad because of the silent nature of the issue, it interferes with
ancestral
resulting in repeating branch annotations that can be hard to notice.As an example, refine output from a tree with UFBoot:
And the same for ancestral:
Note that the node names are replaced with UFBoot values.
Expected behavior
UFBoot values should be parsed like standard bootstrapping values and not interfere with node names.
How to reproduce
Augur tree was run with
tree_builder_args = "-B 1000 -alrt 1000 -bnni"
to produce the UFBoot annotated newick tree that is used in refine/ancestral steps.Possible solution
I think the issue is that UFBoot values are treated as strings while standard bootstrap values are integers. Correctly parsing the UFBoot values would partially solve the issue.
Your environment: if running Nextstrain locally