Closed nrclaudio closed 3 years ago
hello,
I can't reproduce this error, so I guessing its something with your sra-tools configuration.
check vdb-config -i
and make sure its configured correctly, then
try running sra-stat --meta --quick SRR6337208
and see if the error persists.
sra-tools help: https://github.com/ncbi/sra-tools/blob/master/README-vdb-config https://github.com/ncbi/sra-tools/wiki/05.-Toolkit-Configuration
Hi,
Thanks for the prompt answer. I've checked and I doubt it has something to do with the config. When I run the command the output looks like this:
SRR6337208|TTTCATGA|8118133:1071593556:1006648492|:|:|:
SRR6337208|GTTCATGA|22024:2907168:2730976|:|:|:
SRR6337208|TTTCATGC|42988:5674416:5330512|:|:|:
... cnt'd
I've noticed however, that if I run the commands individually it works fine. I'm using slurm (one job submission per SRR ID) and a conda environment with a clean install of parallel-fastq-dump. sra-toolkit v 2.10.9. Any idea of why this is?
that output looks ok.
if you're using slurm, that means the process is running on a different machine, you have to make sure the config is present on the processing node as well.
Solved it by changing the output directories, somehow. I made sure that for each call of parallel-fastq-dump
within a project I had a dedicated directory for its output. I guess it had something to do with my directories having other folders named as raw
too.
Note: in case someone comes to this thread with a similar problem. If using a shared cluster make sure to change the temporary directory to some scratch file system you might have, otherwise your home directory will probably run out of space.
I keep getting the same error as @nrclaudio using this tool with Slurm and Snakemake. Strangely, the tool worked fine for the first batch of ~300 SRA files, now on the second batch run of my pipeline no matter what I do I keep getting this error. @nrclaudio, how did you get around this problem? How did you configure you directories? Many thanks
I just got the same error, and here is the sra-stat output. In my case, the problem seems to be an incomplete sra file. parallel-fastq-dump successfully finished after re-downloading the sra file.
sra-stat --meta --quick /home/vagrant/gfe_data/transcriptome_assembly/tmp/1_SRA_ID/getfastq/Monotropa_uniflora.txt/SRR11994224.sra
2022-01-09T12:52:12 sra-stat.2.11.0 warn: zombie file detected: '/home/vagrant/gfe_data/transcriptome_assembly/tmp/1_SRA_ID/getfastq/Monotropa_uniflora.txt/SRR11994224.sra/tbl/SEQUENCE/col/READ/data'
2022-01-09T12:52:12 sra-stat.2.11.0 int: type unexpected while visiting directory - data: during KDirectoryVisit
2022-01-09T12:52:12 sra-stat.2.11.0 int: type unexpected while visiting directory - READ: while calling KDirectoryVisit
2022-01-09T12:52:12 sra-stat.2.11.0 int: type unexpected while visiting directory - col: while calling KDirectoryVisit
2022-01-09T12:52:12 sra-stat.2.11.0 int: type unexpected while visiting directory - SEQUENCE: while calling KDirectoryVisit
2022-01-09T12:52:12 sra-stat.2.11.0 int: type unexpected while visiting directory - tbl: while calling KDirectoryVisit
2022-01-09T12:52:12 sra-stat.2.11.0 int: type unexpected while visiting directory - while calling KDirectoryVisit
I keep getting the same error as @nrclaudio using this tool with Slurm and Snakemake. Strangely, the tool worked fine for the first batch of ~300 SRA files, now on the second batch run of my pipeline no matter what I do I keep getting this error. @nrclaudio, how did you get around this problem? How did you configure you directories? Many thanks
This will highly depend on your specifics, but this is what I did for one of my samples (the directories will, of course, be different):
parallel-fastq-dump-MySample.sh
#! /bin/bash
while read srr
do
sbatch parallel-fastq-dump.run $srr MySample
done < <(cat acc_list.txt)
parallel-fastq-dump.slurm
#!/bin/bash
#SBATCH -J parallel-fastq-dump
# Clear the environment from any previously loaded modules
module purge > /dev/null 2>&1
# Load the module environment suitable for the job
module load tools/miniconda/python3.8/4.9.2
conda activate parallel-fastq-dump
module load bioinformatics/tools/ncbi/sra/2.10.9
cd data/raw/${2} ## This will be your sample directory, where the FASTQS will be downloaded to, in this case 'MySample'
echo $PWD
parallel-fastq-dump -s $1 --split-files --gzip -O . --tmpdir /exports/tmp/
Hi,
I'm trying to run parallel-fastq-dump, but I get the error provided below. I can find similar issues here, but none of them solves my problem. The specific call is as follows:
parallel-fastq-dump --sra-id $1 --threads 4 --outdir raw/ --split-files --gzip
Where $1 is the result of parsing a SRR ID list.
The log from one of the SRA IDs (SRR6337208):