ncbi / sra-tools

SRA Tools
Other
1.07k stars 243 forks source link

"fasterq-dump.3.0.7 err: the input data is missing the QUALITY-column" BUT sra QUALITY column is present. #851

Open GabeAl opened 10 months ago

GabeAl commented 10 months ago

This misbehavior happens with fasterq-dump 3.0.7 on the following accession and a few others, at a rate of about 7% of biosample-based accessions. SAMN08049698 and some other biosample-based accessions fail, but most work fine including within the same project. Interestingly, the underlying run in this case (SRR6468610) still works fine.

Why this is a bug:

  1. The sra file appears to indeed contains quality scores
  2. There is only one underlying (single) RUN accession for these biosamples (which is, in turn, the only fastq data associated with them)
  3. The underlying run accession, when supplied directly, works as expected for these cases, including the example.
  4. Other biosample accessions from the same study submitted at the same time from the same modality (even from the same subject!) work perfectly. In this case that would be SAMN08049628 (another biosample that dumps properly directly to fastq).

A workaround could be manually using entrez direct or something to translate into Run IDs, dumping the Run IDs, renaming (or concatenating) to biosample IDs again. But that's quite the workaround for the couple of files this fails on, so I thought I'd report it here.

mortunco commented 5 months ago

I am having the same problem. I am running fasterq-dump in a snakemake workflow. Weirdly. when i run without specifying a temp directory (which uses default) OR when I direct my home (~/temp-fasterq-dump). I get no errors. I am thinking it might be a permssion or directory lock problem maybe?

Important notes 1) I download SRA from AWS s3 bucket then do fasterq-dump to download SRA file. 2) despite the error. fastq 1 and 2 are generated.


$ code/sratoolkit.3.0.7-centos_linux64/bin/fasterq-dump raw-data/testproject-001/GSM3148577_BC10_TUMOR1/SRR7191904.sra --threads 8 --split-3 -O raw-data/testproject-001/GSM3148577_BC10_TUMOR1 -t raw-data/testproject-001/GS
M3148577_BC10_TUMOR1/temp-fasterq-dump 
spots read      : 41,812,717
reads read      : 83,625,434
reads written   : 83,625,434
2024-01-09T18:32:54 fasterq-dump.3.0.7 err: the input data is missing the QUALITY-column