Currently clades are defined independently of one another in the provided TSV, but we often duplicate the mutations of a parent clade. For example, 21L is a descendant of 21M so we have the following:
21M (Omicron) nuc 23525 T
21M (Omicron) nuc 23599 G
21L (Omicron) nuc 23525 T ## mutation actually defines 21M
21L (Omicron) nuc 23599 G ## mutation actually defines 21M
21L (Omicron) nuc 24424 T
We should allow clades to be inherited, e.g.:
21M (Omicron) nuc 23525 T
21M (Omicron) nuc 23599 G
21L (Omicron) clade 21M (Omicron)
21L (Omicron) nuc 24424 T
There are a few considerations here:
This introduces the potential for circular dependencies (A descended from B descended from A) which should be fatal errors.
When both a parent and descendant clade are annotated on the same branch, the branch label should represent the descendant clade.
Should multiple parent clades be allowed? Probably easiest to limit the current implementation to a single parent lineage.
Currently clades are defined independently of one another in the provided TSV, but we often duplicate the mutations of a parent clade. For example, 21L is a descendant of 21M so we have the following:
We should allow clades to be inherited, e.g.:
There are a few considerations here:
Related
augur clade
oddities, including some bugs related to parsing the TSV file, which should be addressed before / as part of this issuePossible solution
There seem to be two implementations available:
I prefer solution 2, but I don't think the results will be different.