nextstrain / augur

Pipeline components for real-time phylodynamic analysis
https://docs.nextstrain.org/projects/augur/
GNU Affero General Public License v3.0
268 stars 129 forks source link

Missing trait literals are converted to lower case in augur export stderr output #1584

Open corneliusroemer opened 4 weeks ago

corneliusroemer commented 4 weeks ago

Current Behavior

Augur export converts literals to lower case in its stderr output - which is probably not what we want

Expected behavior

Literals are output as they are, not changing case.

How to reproduce

Steps to reproduce the current behavior:

  1. Include some extra literals in a colors file, including upper case in non-initial position, e.g. 21L
  2. Run augur export with --colors
  3. Observe log message
$ augur export v2             --tree builds/wuhan/tree.nwk             --metadata builds/wuhan/metadata_with_bloom_scores.tsv             --node-data builds/wuhan/branch_lengths.json builds/wuhan/muts.json builds/wuhan/clades_display.json builds/wuhan/clades.json builds/wuhan/clades_nextstrain.json builds/wuhan/clades_who.json builds/wuhan/internal_pango.json             \
  --colors builds/wuhan/colors.tsv             --auspice-config profiles/clades/wuhan/auspice_config.json             --title 'SARS-CoV-2 phylogeny'             --description profiles/clades/description.md             --include-root-sequence-inline             --minify-json             --output auspice/wuhan/auspice_raw.json

Validating schema of 'builds/wuhan/muts.json'...
Validating config file profiles/clades/wuhan/auspice_config.json against the JSON schema
Validating schema of 'profiles/clades/wuhan/auspice_config.json'...
WARNING: Requested color-by field 'placement_priors' does not exist and will not be used as a coloring or exported.

WARNING: These values for trait clade_membership were not specified in the colors file you provided:
        21k, 21f, 20f, 20h, 20b, 21m, 19a, 20i, 20c, recombinant, 21e, 21g, 20g, 20a, 21j, 20e, 20j, 21d, 21i, 21a, 20d, 19b, 21b, 21c, 21h.
        Auspice will create colors for them.
WARNING: These values for trait clade_who were not specified in the colors file you provided:
        recombinant.
        Auspice will create colors for them.

WARNING: These values for trait clade_nextstrain were not specified in the colors file you provided:
        21k, 21f, 20f, 20h, 20b, 21m, 19a, 20i, 20c, recombinant, 21e, 21g, 20g, 20a, 21j, 20e, 20j, 21d, 21i, 21a, 20d, 19b, 21b, 21c, 21h.
        Auspice will create colors for them.

Validating produced JSON
Validating schema of 'auspice/wuhan/auspice_raw.json'...
Validating that the JSON is internally consistent...
        WARNING:  The filter "new_node" does not appear as a property on any tree nodes.
Validation of 'auspice/wuhan/auspice_raw.json' succeeded, but there were warnings you may want to resolve.

Note this line:

 21k, 21f, 20f, 20h, 20b, 21m, 19a, 20i, 20c, recombinant, 21e, 21g, 20g, 20a, 21j, 20e, 20j, 21d, 21i, 21a, 20d, 19b, 21b, 21c, 21h.

the input colors were 21K not 21k:

image
jameshadfield commented 3 weeks ago

Here's the code behind this - the erroneous console output is a side-effect of the matching being done in lower case:

https://github.com/nextstrain/augur/blob/988380c0c65efcabcc6c87a7967b0e2bcc41a0fc/augur/export_v2.py#L330-L344