loculus-project / loculus

An open-source software package to power microbial genomic databases
https://loculus.org
GNU Affero General Public License v3.0
34 stars 1 forks source link

NCBI datasets error causing ingest to fail for west nile and cchf #2828

Closed anna-parker closed 1 day ago

anna-parker commented 1 day ago

ncbi datasets cli does not recognize the data we downloaded using that same cli in the previous rule

localrule extract_ncbi_dataset_sequences:
    input: results/ncbi_dataset.zip
    output: results/sequences.fasta
    jobid: 8
    reason: Missing output files: results/sequences.fasta; Input files updated by another job: results/ncbi_dataset.zip
    resources: tmpdir=/tmp
        unzip -jp results/ncbi_dataset.zip         ncbi_dataset/data/genomic.fna         | seqkit seq -w0 -i         > results/sequences.fasta
dataformat doesn't recognize this input
For best results
1. Make sure to use --as-json-lines with the datasets command
2. Make sure that you're using the latest version of the datasets command line tool
Use --force to remove this warning.
Download the latest version of the datasets command line tool: https://www.ncbi.nlm.nih.gov/datasets/docs/v2/download-and-install
Error:  unknown field "usaState"
Usage
  dataformat tsv virus-genome [flags]
Examples
  dataformat tsv virus-genome --inputfile sars2_package/ncbi_dataset/data/data_report.jsonl
  dataformat tsv virus-genome --package virus-sars2-refseq.zip
Flags
      --fields strings     Comma-separated list of fields