nextstrain / augur

Pipeline components for real-time phylodynamic analysis
https://docs.nextstrain.org/projects/augur/
GNU Affero General Public License v3.0
268 stars 128 forks source link

export v2: Add option to choose sample name from metadata #1264

Open joverlee521 opened 11 months ago

joverlee521 commented 11 months ago

Context

Since https://github.com/nextstrain/augur/pull/1240, we support using arbitrary columns as the metadata ID column. This column is then used as the sample name in the final output Auspice JSON.

Description

It would be nice to be able to choose a different column from the metadata to use as the sample name. The column used as the metadata ID may not be descriptive enough to use as the sample name for display in Auspice.

Examples

We already do this in basically identical scripts for monkeypox and rsv.

jameshadfield commented 11 months ago

See https://github.com/nextstrain/auspice/pull/1668 for an alternative solution

joverlee521 commented 11 months ago

@jameshadfield Ah nice! I totally missed that PR!

Then we would need to add tip_label as an allowed property for display_defaults in our auspice config schema.

I also think export v2 should verify the tip_label field is an available node_attr or automatically add it as a node_attr if the field is available in the metadata file.

jameshadfield commented 11 months ago

Great suggestions -- I think that Auspice PR needs more testing & feedback before we use it, but I think it's ultimately a better direction than swapping back in the strain name (as monkeypox + rsv do) because of how duplicate node names are handled in Auspice.