nextstrain / augur

Pipeline components for real-time phylodynamic analysis
https://docs.nextstrain.org/projects/augur/
GNU Affero General Public License v3.0
268 stars 129 forks source link

Allow custom date column name to be specified in `refine` - similar to `metadata-id-column` #1443

Open corneliusroemer opened 5 months ago

corneliusroemer commented 5 months ago

Context

When working with data straight off ncbi, the collection date column is called Isolate Collection date rather than date - we also often call it collection_date to distinguish from release_date or update_date.

It would be nice if one could configure the date column via argument, similar to --metadata-id-column, e.g --collection-date-column="Isolate Collection date"

victorlin commented 4 months ago

Short-term: I would call this --metadata-date-columns, accepting multiple values and using the first that's available. That would maintain consistency with --metadata-id-columns/--metadata-delimiters. The new option should be available wherever those are available.

Long-term: The need to specify metadata parameters for every augur subcommand is a bit tedious (example: https://github.com/nextstrain/mpox/commit/927ad6cdf0f7e96384ab8a53f87aee7b5c4e658b) and prone to human error when updating. Under expected usage of Augur, it's likely the case that the same metadata parameters will be used across all commands in a given project/workflow. Configuration through environment variables can reduce duplication. Something like:

export AUGUR_METADATA_DELIMITER=;
export AUGUR_METADATA_ID_COLUMN=accession
export AUGUR_METADATA_DATE_COLUMN=collection_date

# Now these don't need to specify --metadata-delimiters, --metadata-id-columns, --metadata-date-columns
augur filter …
augur traits …
augur refine …
augur export v2 …