These changes are to allow addition of private metadata + sequences to be spiked into the build (after filtering but prior to subsampling). They are intended for use by INRB using an excel + fasta file which they maintain. As such, these changes probably shouldn't be merged into our canonical mpox repo, but perhaps we fork the repo into the INRB organisation and maintain these (small) changes there?
If you have access to the test xlsx + fasta (see this internal slack thread) then you can run from within the phylogenetic directory via:
These changes are to allow addition of private metadata + sequences to be spiked into the build (after filtering but prior to subsampling). They are intended for use by INRB using an excel + fasta file which they maintain. As such, these changes probably shouldn't be merged into our canonical mpox repo, but perhaps we fork the repo into the INRB organisation and maintain these (small) changes there?
If you have access to the test
xlsx
+fasta
(see this internal slack thread) then you can run from within thephylogenetic
directory via:python scripts/curate_private_data.py --sequences <fasta file> --xlsx <xlsx file> --remap-columns 'accession:accession' 'accession:strain' 'collection date:date' 'province:division' 'health zone:location' --fasta-header-idx 1
nextstrain build . --configfile defaults/clade-i/config.yaml --config private_data=true auspice_config=defaults/clade-i/auspice_config_inrb.json -f results/clade-i/good_metadata_combined.tsv auspice/mpox_clade-I.json
to do