"fasterq-dump.3.0.7 err: the input data is missing the QUALITY-column" BUT sra QUALITY column is present.

ncbi / sra-tools

SRA Tools

Other

1.07k stars 243 forks source link

This misbehavior happens with fasterq-dump 3.0.7 on the following accession and a few others, at a rate of about 7% of biosample-based accessions. SAMN08049698 and some other biosample-based accessions fail, but most work fine including within the same project. Interestingly, the underlying run in this case (SRR6468610) still works fine.

Why this is a bug:

The sra file appears to indeed contains quality scores
There is only one underlying (single) RUN accession for these biosamples (which is, in turn, the only fastq data associated with them)
The underlying run accession, when supplied directly, works as expected for these cases, including the example.
Other biosample accessions from the same study submitted at the same time from the same modality (even from the same subject!) work perfectly. In this case that would be SAMN08049628 (another biosample that dumps properly directly to fastq).

A workaround could be manually using entrez direct or something to translate into Run IDs, dumping the Run IDs, renaming (or concatenating) to biosample IDs again. But that's quite the workaround for the couple of files this fails on, so I thought I'd report it here.

$ code/sratoolkit.3.0.7-centos_linux64/bin/fasterq-dump raw-data/testproject-001/GSM3148577_BC10_TUMOR1/SRR7191904.sra --threads 8 --split-3 -O raw-data/testproject-001/GSM3148577_BC10_TUMOR1 -t raw-data/testproject-001/GS M3148577_BC10_TUMOR1/temp-fasterq-dump spots read : 41,812,717 reads read : 83,625,434 reads written : 83,625,434 2024-01-09T18:32:54 fasterq-dump.3.0.7 err: the input data is missing the QUALITY-column

ncbi / sra-tools

"fasterq-dump.3.0.7 err: the input data is missing the QUALITY-column" BUT sra QUALITY column is present. #851