Closed bsweger closed 2 months ago
This worked nicely!
# get the dataset .zip
nextclade dataset get -n sars-cov-2 --tag "2024-07-03--08-29-55Z" --output-zip zippy.zip
# use nextclade run to assign the sequences, using the genbank sequence data downloaded from NCBI and passing in the zip file created above to use as reference sequence and tree
nextclade run data/ncbi_dataset/data/genomic.fna --input-dataset=zippy.zip --output-csv tabby.csv
The commands above output roughly what we get from the existing version of the pipeline.
Closing this, since the goal was to get enough information to ask good follow-up questions on the e-mail thread with the nextstrain folks.
Currently, we call a nextstrain API to retrieve a reference tree for a specified date:
However, this may not be correct. Before updating the pipeline code, let's try the alternate method.
^ or --output-zip instead of --output-dir
nextclade run
command that assigns clades to the genbank sequences we got from NCBI. See here for more context: https://github.com/nextstrain/ncov-ingest/blob/c0d63f8f959705eda1bf0d3414127fc861919fe1/workflow/snakemake_rules/nextclade.smk#L175-L217