Closed jameshadfield closed 5 years ago
@jameshadfield With regard to the error in augur export
V2 legend order:
The new unified JSON schema changes how colouring is specified. In V1 JSONs color_map
is a list of lists:
[ ["<1y", "#d7191c"], ["1-5y", "#fdae61"] .... ]
In V2 JSONs, scale
is a dict:
{ "1-5y": "#fdae61", "18-64y": "#abd9e9", ... }
So this is where the ordering problem comes from. As we've found before, though dicts technically do not have an order, it looks like through augur export
V2 the order is actually maintained - until the JSON is written out.
This is because in utils.py
write_json
, json.dump
has the flag sort_keys
set to True. (And it seems to do this alphabetically.) (This doesn't affect V1 legends because it's a list, I guess.)
Setting this to False
fixes the legend ordering problem, but messes up the overall JSON order, so tree
is above all the colourings, authors, etc (which personally I find much less useful - currently it's easy to check on your metadata stuff without scrolling past a huge tree). However, we could probably fix this in export by just adding things to unified
in the order we want, rather than as we process them.
So one solution would be to add an argument to utils.py
write_json
so that sort_keys
can be set False and use this in augur export
V2. But, we should think about what else this will impact.
What think you?
@emmahodcroft It's generally unreliable to rely on the order of keys in a dictionary when crossing languages, regardless of what order they appear in when serialized to JSON. Different languages make different guarantees about preserving key ordering. I think the simplest thing would be to keep the scale as a list of lists, which is a standard and reasonable approach to cross-language ordered key-value pairs. This also allows arbitrary user-defined ordering rather than forcing an ASCII-betical sort. @jameshadfield What motivated the change from a list of lists to a dict? Could we keep it as a list of lists?
I don't see a problem with returning to the v1 schema here (list of lists). Can't remember anything important about why I proposed turning it into a dict.
FYI @joverlee521 and I just finished going through the augur export
process for the unified JSONs. Here is what my process looked like.
@jameshadfield asked me to note any 'WTF' moments, which are tagged with WTF below:
augur export
processStarted with the zika-colombia
repo. Ran nextstrain build .
When running augur export
and seeing the required parameters in the help message, I wanted to run
augur export -t auspice/zika-colombia_tree.json --metadata auspice/zika-colombia_meta.json
When that didn't work, looked at the Snakefile
.
Tried
augur export \
--tree results/tree.nwk \
--metadata results/metadata.tsv \
--node-data results/branch_lengths.json \
results/traits.json \
results/nt_muts.json \
results/aa_muts.json
and received the message
ERROR: config file None not found.
as well as a TypeError: expected str, bytes or os.PathLike object, not NoneType
Switched to zika-tutorial
. Ran nextstrain build .
there, too.
Copied and ran the following code from the tutorial:
augur export \
--tree results/tree.nwk \
--metadata data/metadata.tsv \
--node-data results/branch_lengths.json \
results/traits.json \
results/nt_muts.json \
results/aa_muts.json
Got the same errors as above. WTF
Ran the code from step 4 again in zika-colombia
but with the flag --new-schema
(thanks to Jover's investigation).
Got the following errors:
ERROR: file with states does not exist
TypeError: expected str, bytes or os.PathLike object, not NoneType
Ran the code from step 6 again in zika-tutorial
but with the flag --new-schema
and got this warning plus error:
WARNING: Contradictory author information: {'authors': 'Ho et al', 'title': 'Outbreak of Zika virus infection in Singapore: an epidemiological, entomological, virological, and clinical analysis', 'journal': 'Lancet Infect Dis (2017) In press', 'url': 'https://www.ncbi.nlm.nih.gov/nuccore/KY241688'} vs {'authors': 'Ho et al', 'title': 'Outbreak of Zika virus infection in Singapore: an epidemiological, entomological, virological, and clinical analysis', 'journal': 'Lancet Infect Dis (2017) In press', 'url': 'https://www.ncbi.nlm.nih.gov/nuccore/KY241726'}
...
TypeError: expected str, bytes or os.PathLike object, not NoneType
WTF
Back to zika-colombia
: Jover says I need an output file. Ran the following code:
augur export --tree results/tree.nwk --metadata data/metadata.tsv --node-data results/branch_lengths.json results/traits.json results/nt_muts.json results/aa_muts.json --new-schema --output-main out.json
and received the error:
ERROR: file with states does not exist
but out.json
was still generated.
WTF
Jover says it looks good but I'm missing metadata.
Ran augur export -h
.
Tried to match the augur
options with what was listed in the config/auspice_config.json
.
Note: maintainers
and maintainer-urls
is split in augur
but not in the auspice
config.
WTF
geography-traits
? WTF Matched with geo
in auspice
config.
extra-traits
: skipped WTF
panels
: gave nothing b/c of default
tree-name
: skipped WTF
Ended up running the code in step 8 (in zika-tutorial
) plus:
--title 'Real-time tracking of Zika virus \ evolution'
--maintainers "Me" \
--maintainer-urls 'google.com' \
--geography-traits 'country' 'region'
Moved onto the augur validate
process.
First ran augur validate out.json
.
The JSON file is not a positional argument WTF.
Then, ran augur validate -h
.
WTF is the nextflu schema
?
Ran:
augur validate --json out.json --new-schema
Received:
Validating out.json using the main schema (version new)... SUCCESS
Validating that out.json is internally consistent...
SUCCESS
Now working in the v6
branch -- with bug fixes etc to be expected
Currently
augur export
works well for v1 JSONs (i.e. meta + tree files) but is generally broken for v2 (henceforth referred to as "unified") JSON production. This issue asks to get this functionality working for unified JSONs as currently consumed by auspice on thev2
branch: https://github.com/nextstrain/auspice/tree/v2. This functionality is urgently needed in order to better test auspice v2. Note that some unified JSON functionality inaugur export
is present, however it doesn't generate JSONs which validate against the schema (see below).If auspice from the v2 branch is installed (e.g.
npm install auspice@alpha
-- see https://github.com/nextstrain/auspice/pull/720 for more deatils & background) thenauspice convert
is available which converts meta+tree JSONs to the unified JSON. This may be helpful to compare augur vs auspice approaches.schema & config modifications
While there are multiple issues proposing slight revisions to the schema for the unified JSON, this issue asks it to be implemented as per the JSON schema defined here: https://github.com/nextstrain/augur/pull/277/files#diff-c021e1723b2195f3dada2595f3d70ebb. After it is working for this it'll be simple to add slight modifications as needed before the we release a new version.
As far as this issue is concerned, it's simplest to use the same (or similar) auspice config JSON. There are proposals to change this, however this discussion is best held somewhere else, with subsequent modifications added in after this issue has been closed.
User interface for
augur export
@emmahodcroft and I talked a lot about this a few weeks ago and decided that it's desirable to have command line arguments available to replace a config file for simple use cases The command line arguments are currently implemented in augur master. For complicated cases it's necessary to use a JSON config file (see above), however the ability to use a config file is not yet implemented in augur.