nextstrain / mpox

Nextstrain build for mpox virus
https://nextstrain.org/mpox
MIT License
39 stars 16 forks source link

`fix_tree.py` can create invalid tree #232

Open joverlee521 opened 5 months ago

joverlee521 commented 5 months ago

The hmpxv1_big build failed yesterday with a validation error from augur export v2

[batch] [2024-01-21T16:43:57-08:00] Validating schema of 'results/hmpxv1_big/nt_muts.json'...
[batch] [2024-01-21T16:43:57-08:00] Validating schema of 'results/hmpxv1_big/aa_muts.json'...
[batch] [2024-01-21T16:43:57-08:00] Validating config file config/hmpxv1_big/auspice_config.json against the JSON schema
[batch] [2024-01-21T16:43:57-08:00] Validating schema of 'config/hmpxv1_big/auspice_config.json'...
[batch] [2024-01-21T16:43:57-08:00] Validating produced JSON
[batch] [2024-01-21T16:43:57-08:00] Validating schema of 'results/hmpxv1_big/raw_tree.json'...
[batch] [2024-01-21T16:43:57-08:00] Validating that the JSON is internally consistent...
[batch] [2024-01-21T16:43:57-08:00] Node OP615261 appears multiple times in the tree.
[batch] [2024-01-21T16:43:57-08:00] ------------------------
[batch] [2024-01-21T16:43:57-08:00] Validation of results/hmpxv1_big/raw_tree.json failed. Please check this in a local instance of `auspice`, as it is not expected to display correctly. 

I searched for OP615261 in the results files and see that it only appears once in the tree_raw.nwk (produced by augur tree) but appears twice in the tree_fixed.nwk (produced by scripts/fix_tree.py). Somehow scripts/fix_tree.py is duplicating the node.

joverlee521 commented 3 months ago

Failure in hmpxv_big build today possibly related to fix_tree.py, where augur refine is running into a None node:

[batch] [2024-03-27T18:00:42+00:00] ERROR: TreeAnc.optimal_branch_length: terminal node alignments required; sequence is missing for leaf: 'None'. Missing terminal sequences can be inferred from sister nodes by rerunning with `reconstruct_tip_states=True` or `--reconstruct-tip-states`
[batch] [2024-03-27T18:00:42+00:00] ERROR from TreeTime: This error is most likely due to a problem with your input data.
[batch] [2024-03-27T18:00:42+00:00] Please check your input data and try again. If you continue to have problems, please open a new issue including
[batch] [2024-03-27T18:00:42+00:00] the original command and the error above:  <https://github.com/nextstrain/augur/issues/new/choose>
[batch] [2024-03-27T18:00:42+00:00] augur refine is using TreeTime version 0.11.3
[batch] [2024-03-27T18:00:42+00:00] 367.47  ***WARNING: TreeAnc._check_alignment_tree_gtr_consistency: NO SEQUENCE
[batch] [2024-03-27T18:00:42+00:00]         FOR LEAF: 'None'
[batch] [2024-03-27T18:00:42+00:00] 367.50  ***WARNING: TreeAnc: 1 nodes don't have a matching sequence in the
[batch] [2024-03-27T18:00:42+00:00]         alignment. POSSIBLE ERROR.