Open GabeAl opened 10 months ago
I am having the same problem. I am running fasterq-dump in a snakemake workflow. Weirdly. when i run without specifying a temp directory (which uses default) OR when I direct my home (~/temp-fasterq-dump). I get no errors. I am thinking it might be a permssion or directory lock problem maybe?
Important notes 1) I download SRA from AWS s3 bucket then do fasterq-dump to download SRA file. 2) despite the error. fastq 1 and 2 are generated.
$ code/sratoolkit.3.0.7-centos_linux64/bin/fasterq-dump raw-data/testproject-001/GSM3148577_BC10_TUMOR1/SRR7191904.sra --threads 8 --split-3 -O raw-data/testproject-001/GSM3148577_BC10_TUMOR1 -t raw-data/testproject-001/GS
M3148577_BC10_TUMOR1/temp-fasterq-dump
spots read : 41,812,717
reads read : 83,625,434
reads written : 83,625,434
2024-01-09T18:32:54 fasterq-dump.3.0.7 err: the input data is missing the QUALITY-column
This misbehavior happens with fasterq-dump 3.0.7 on the following accession and a few others, at a rate of about 7% of biosample-based accessions. SAMN08049698 and some other biosample-based accessions fail, but most work fine including within the same project. Interestingly, the underlying run in this case (SRR6468610) still works fine.
Why this is a bug:
A workaround could be manually using entrez direct or something to translate into Run IDs, dumping the Run IDs, renaming (or concatenating) to biosample IDs again. But that's quite the workaround for the couple of files this fails on, so I thought I'd report it here.