nextstrain / augur

Pipeline components for real-time phylodynamic analysis
https://docs.nextstrain.org/projects/augur/
GNU Affero General Public License v3.0
268 stars 129 forks source link

refine: Support rooting with time data without producing a time tree #1479

Open huddlej opened 4 months ago

huddlej commented 4 months ago

Context

As we develop more Nextclade datasets, we encounter a common pattern where we want to build a divergence tree for Nextclade instead of a time tree, but we want to root the divergence tree in the augur refine step using time information.

We don't support this pattern directly in refine now, so we have different workarounds. In the seasonal flu repo, we root the divergence tree by calling the treetime clock and then we run augur refine with the --keep-root flag and without the --timetree argument.

In the measles workflow, we run augur refine with the --timetree argument and default behavior where TreeTime finds the best root with time information. Then, we produce a new branch lengths JSON with the time branch lengths removed before passing that file to augur export.

This improvement is related to the issue where we often need to root with a specific reference sequence and then prune the root before building a time tree.

Description

Ideally, we could skip the custom rules/scripts described above by allowing users to root in augur refine with time information but without building a time tree. When users request --root best to root with time information and without providing the --timetree flag, augur refine prints the following error:

Warning: To root without inferring a timetree, you must specify an explicit outgroup.

I propose that we allow users to request the "best" root when they provide metadata with time information even if they don't request a time tree. Internally, I think augur refine would still need to build a time tree and use this tree to determine the best root. Then, if the user did not provide --timetree, the tree that gets exported would be the divergence tree.

Relatedly, we potentially also want to allow users to use other time-based functionality like --clock-filter-iqd even if they don't request a time tree.