Open dmalzl opened 11 months ago
the executed command was this
fasterq-dump \
--split-files --include-technical \
--threads 6 \
--outfile SRX10737613_SRR14385311 \
\
SRR14385311
It looks like the tool is confused about the output-file. Try this command: 'fasterq-dump --split-files --include-technical SRR14385311' The --threads 6 is not necessary, it is the default. The --outfile is not neccessary, the tool will create the output-filename from the accession. I think it is confused because you included the experiment in the output-file. It should not be confused about that. I will have to investigate why this happens. In the mean time try the shortened command.
Thanks for the swift response and the workaround. I'll try to modify the code of the pipeline I am using. However, to me it looks like the path gets split at the .
character somewhen in the process where the _1, _2 suffix is inserted and then concatenated again. So it might be the .
confusing it but I try and report back
by the way... what is the version of fasteq-dump you are using?
the version is 3.0.8
Just to let you know. This does not occur in version 2.11.0
Thanks for reporting @dmalzl ! And thanks for investigating @wraetz 🙏🏽
I have managed to reproduce the issue and the problem is indeed the fact that a .
exists in the path where the output files will be written.
env.yml
with the dependencies below (you can exclude pigz
if you like):name: sra-tools-3.0.8
channels:
- conda-forge
- bioconda
- defaults
dependencies:
- conda-forge::pigz=2.6
- bioconda::sra-tools=3.0.8
conda env create -f env.yml
.
mkdir testwithoutdot
cd testwithoutdot
prefetch SRR12848126
fasterq-dump \
--split-files --include-technical \
--outfile SRX9315476_SRR12848126 \
SRR12848126
2024-01-05T11:41:39 prefetch.3.0.8: Current preference is set to retrieve SRA Normalized Format files with full base quality scores.
2024-01-05T11:41:39 prefetch.3.0.8: 1) Downloading 'SRR12848126'...
2024-01-05T11:41:39 prefetch.3.0.8: SRA Normalized Format file is being retrieved, if this is different from your preference, it may be due to current file availability.
2024-01-05T11:41:39 prefetch.3.0.8: Downloading via HTTPS...
2024-01-05T11:41:40 prefetch.3.0.8: HTTPS download succeed
2024-01-05T11:41:40 prefetch.3.0.8: 'SRR12848126' is valid
2024-01-05T11:41:40 prefetch.3.0.8: 1) 'SRR12848126' was downloaded successfully
2024-01-05T11:41:41 prefetch.3.0.8: 'SRR12848126' has 1 unresolved dependency
2024-01-05T11:41:41 prefetch.3.0.8: 2) Downloading 'ncbi-acc:NC_000069.6?vdb-ctx=refseq'...
2024-01-05T11:41:41 prefetch.3.0.8: Downloading via HTTPS...
2024-01-05T11:41:43 prefetch.3.0.8: HTTPS download succeed
2024-01-05T11:41:43 prefetch.3.0.8: 2) 'ncbi-acc:NC_000069.6?vdb-ctx=refseq' was downloaded successfully
spots read : 1,517
reads read : 3,034
reads written : 2,982
.
mkdir test.withdot
cd test.withdot
prefetch SRR12848126
fasterq-dump \
--split-files --include-technical \
--outfile SRX9315476_SRR12848126 \
SRR12848126
2024-01-05T11:37:35 prefetch.3.0.8: Current preference is set to retrieve SRA Normalized Format files with full base quality scores.
2024-01-05T11:37:35 prefetch.3.0.8: 1) Downloading 'SRR12848126'...
2024-01-05T11:37:35 prefetch.3.0.8: SRA Normalized Format file is being retrieved, if this is different from your preference, it may be due to current file availability.
2024-01-05T11:37:35 prefetch.3.0.8: Downloading via HTTPS...
2024-01-05T11:37:36 prefetch.3.0.8: HTTPS download succeed
2024-01-05T11:37:36 prefetch.3.0.8: 'SRR12848126' is valid
2024-01-05T11:37:36 prefetch.3.0.8: 1) 'SRR12848126' was downloaded successfully
2024-01-05T11:37:37 prefetch.3.0.8: 'SRR12848126' has 1 unresolved dependency
2024-01-05T11:37:37 prefetch.3.0.8: 2) Downloading 'ncbi-acc:NC_000069.6?vdb-ctx=refseq'...
2024-01-05T11:37:37 prefetch.3.0.8: Downloading via HTTPS...
2024-01-05T11:37:55 prefetch.3.0.8: HTTPS download succeed
2024-01-05T11:37:55 prefetch.3.0.8: 2) 'ncbi-acc:NC_000069.6?vdb-ctx=refseq' was downloaded successfully
Error: fasterq-dump cannot create this file: '/home/harshil/test_2.withdot/SRX9315476_SRR12848126'
Error: fasterq-dump cannot create this file: '/home/harshil/test_1.withdot/SRX9315476_SRR12848126'
spots read : 1,517
reads read : 3,034
=============================================================
An error occurred during processing.
A report was generated into the file '/home/harshil/ncbi_error_report.txt'.
If the problem persists, you may consider sending the file
to 'sra-tools@ncbi.nlm.nih.gov' for assistance.
=============================================================
fasterq-dump quit with error code 3
The problem is this function here, which splits on any period found and creates a new filename. It should split on the final period only, or even better use some form of path handling (not 100% familiar with code).
I am currently trying to download a couple of raw sequencing data files using sra-tools prefetch and fasterq-dump. Prefetch works fine but I get a weird error when trying to convert the generated *.sra file to fastq with fasterq-dump. The data is paired-end and the actual path should be
/scratch/daniel.malzl/work/aa/7ab6e5d29db7a0352a1f1cd4af2af3/SRX10737613_SRR14385311
but judging by the error message there seems to be some bug in the renaming code because it says the following:so it seems to insert the read1, read2 suffixes into the path causing the path to be invalid.
The version I am using is 3.0.8.