nextstrain / augur

Pipeline components for real-time phylodynamic analysis
https://docs.nextstrain.org/projects/augur/
GNU Affero General Public License v3.0
268 stars 128 forks source link

augur curate format-host #1586

Open joverlee521 opened 2 months ago

joverlee521 commented 2 months ago

Inspired by @kimandrews 's work in https://github.com/nextstrain/rabies/pull/8

Copying my comment as an official proposal for a new augur curate format-host command

The host rules can be formatted like <group>/<family>/<genus>/<species>\t<new_host_label> Picking a couple examples from your script: host_hierarchy new_host_label
odd-toed ungulates/*/*/* Other Ungulate
*/Mephitidae/*/* Skunk
*/Canidae/Vulpes/* Fox (Vulpes sp.)
*/Procyonidae/Procyon/* Raccoon
*/*/*/Canis lupus familiaris Domestic Dog

Then the generalized script would match starting from group down to species.

I don't think Augur would have any default host rules since the useful groupings will vary widely by pathogen.

kimandrews commented 2 months ago

This is great! It's possible that for other pathogens we may also want to use criteria based on higher Linnaean taxonomic categories (e.g. Class or Order). But I also see downsides to adding more/*/* because it can get confusing.