Closed mvdbeek closed 1 year ago
What would currently be the best way to obtain genomes/protein before a specific day without the --released-before function? Simply download more than you need and then filter?
Hi snddns,
If you're interested in downloading virus genomes released before a specific date, you could try the following:
Here's an example:
# Generate a table of monkeypox genomes released before 2005
datasets summary virus genome taxon monkeypox --as-json-lines | \
jq -r 'select(.release_date < "2005") | [.accession,.virus.organism_name,.release_date] | @tsv' > monkeypox-genomes.tsv
# Get the accession list from the table
cut -f1 monkeypox-genomes.tsv > accession.list
# Download the genomes
datasets download virus genome accession --inputfile accession.list --filename monkeypox.zip
I hope that helps. Please let me know if you have any questions.
Best, Eric
Eric Cox, PhD [Contractor] (he/him/his) NCBI Datasets Sequence Enhancements, Tools and Delivery (SeqPlus) NIH/NLM/NCBI eric.cox@nih.gov
So we can get datasets from a specific day. Otherwise it's tricky to write a deterministic test that doesn't increase in runtime while new datasets are being added.