Open huddlej opened 3 months ago
Even simpler: check for presence of that column and make the filter dependent on whether it's present or not.
I'd prefer automatic stuff over configure.
Also, people should in general just fork things and make the changes they want themselves.
Description
The following hardcoded filter parameter appears at the start of the phylogenetic workflow:
https://github.com/nextstrain/mpox/blob/2ce0d9284ccc8cf9b06e8094c7fa28c8f9d85771/phylogenetic/rules/prepare_sequences.smk#L85
When the user's metadata does not have the two columns referenced in that query (as happens when analyzing data from GISAID, for example),
augur filter
produces the following output:Although that output comes across as an
augur
bug (that a warning is also an error), the proximal issue is that the workflow hardcodes parameters that the user cannot override without changing the workflow itself.Proposed solution
I suggest moving the query string into the config files for the various workflows, specifically moving the hardcoded query into the top-level
filter
section of each config file (e.g.,defaults/mpxv/config.yaml
). Then users who want to analyze data without the fields referenced in that query can create their own config file.