ncbi datasets cli does not recognize the data we downloaded using that same cli in the previous rule
localrule extract_ncbi_dataset_sequences:
input: results/ncbi_dataset.zip
output: results/sequences.fasta
jobid: 8
reason: Missing output files: results/sequences.fasta; Input files updated by another job: results/ncbi_dataset.zip
resources: tmpdir=/tmp
unzip -jp results/ncbi_dataset.zip ncbi_dataset/data/genomic.fna | seqkit seq -w0 -i > results/sequences.fasta
dataformat doesn't recognize this input
For best results
1. Make sure to use --as-json-lines with the datasets command
2. Make sure that you're using the latest version of the datasets command line tool
Use --force to remove this warning.
Download the latest version of the datasets command line tool: https://www.ncbi.nlm.nih.gov/datasets/docs/v2/download-and-install
Error: unknown field "usaState"
Usage
dataformat tsv virus-genome [flags]
Examples
dataformat tsv virus-genome --inputfile sars2_package/ncbi_dataset/data/data_report.jsonl
dataformat tsv virus-genome --package virus-sars2-refseq.zip
Flags
--fields strings Comma-separated list of fields
ncbi datasets cli does not recognize the data we downloaded using that same cli in the previous rule