Open joverlee521 opened 4 weeks ago
I'm starting with the community/moncla-lab/iav-h5/ha/all-clades
Nextclade dataset since that should work across fauna and NCBI sequences. Tested manually on ingest-with-nextclade branch.
nextstrain build \
ingest \
joined-ncbi/results/nextclade.tsv \
--configfile build-configs/ncbi/defaults/config.yaml
Almost everything gets assigned to the expected 2.3.4.4b
clade, except 3 sequences were assigned to the 0
clade:
nextstrain build \
--envdir ../env.d/seasonal-flu/ \
ingest \
fauna/results/nextclade.tsv \
--configfile build-configs/ncbi/defaults/config.yaml
Since this is all avian flu and not just H5, there's ~30% not assigned to any clade.
I'm going to join with metadata ~tomorrow~ Thursday and cross check the clades with the existing clades from fauna.
Latest push to the ingest-with-nextclade branch now joins the metadata with the Nextclade output.
I did a brief look into the fauna side to compare Nextclade clades with the existing clades
Of the 43,642 records
Follow up to #40
With the recent addition of the community H5 Nextclade datasets in https://github.com/nextstrain/nextclade_data/pull/196, it should now be possible to run Nextclade as part of ingest to assign clades to the H5 sequences.
Maybe this can replace the current manual clade labeling process with
clade-labeling
scripts?