nextstrain / mpox

Nextstrain build for mpox virus
https://nextstrain.org/mpox
MIT License
45 stars 19 forks source link

Allow addition of unreleased INRB data #255

Closed jameshadfield closed 1 week ago

jameshadfield commented 6 months ago

These changes are to allow addition of private metadata + sequences to be spiked into the build (after filtering but prior to subsampling). They are intended for use by INRB using an excel + fasta file which they maintain. As such, these changes probably shouldn't be merged into our canonical mpox repo, but perhaps we fork the repo into the INRB organisation and maintain these (small) changes there?

If you have access to the test xlsx + fasta (see this internal slack thread) then you can run from within the phylogenetic directory via:

  1. python scripts/curate_private_data.py --sequences <fasta file> --xlsx <xlsx file> --remap-columns 'accession:accession' 'accession:strain' 'collection date:date' 'province:division' 'health zone:location' --fasta-header-idx 1
  2. nextstrain build . --configfile defaults/clade-i/config.yaml --config private_data=true auspice_config=defaults/clade-i/auspice_config_inrb.json -f results/clade-i/good_metadata_combined.tsv auspice/mpox_clade-I.json

to do

jameshadfield commented 1 week ago

Superseded by #287

Do you want to allow users to configure those names in the workflow config file?

Done in #287 - thanks for the suggestion