nextstrain / mpox

Nextstrain build for mpox virus
https://nextstrain.org/mpox
MIT License
42 stars 19 forks source link

Allow addition of unreleased INRB data #255

Open jameshadfield opened 4 months ago

jameshadfield commented 4 months ago

These changes are to allow addition of private metadata + sequences to be spiked into the build (after filtering but prior to subsampling). They are intended for use by INRB using an excel + fasta file which they maintain. As such, these changes probably shouldn't be merged into our canonical mpox repo, but perhaps we fork the repo into the INRB organisation and maintain these (small) changes there?

If you have access to the test xlsx + fasta (see this internal slack thread) then you can run from within the phylogenetic directory via:

  1. python scripts/curate_private_data.py --sequences <fasta file> --xlsx <xlsx file> --remap-columns 'accession:accession' 'accession:strain' 'collection date:date' 'province:division' 'health zone:location' --fasta-header-idx 1
  2. nextstrain build . --configfile defaults/clade-i/config.yaml --config private_data=true auspice_config=defaults/clade-i/auspice_config_inrb.json -f results/clade-i/good_metadata_combined.tsv auspice/mpox_clade-I.json

to do