rvalieris / parallel-fastq-dump

parallel fastq-dump wrapper
MIT License
275 stars 33 forks source link

fastq-dump uses network even though I prefectech #32

Closed nh13 closed 3 years ago

nh13 commented 3 years ago

I have ERR3240205 and ERR3240205.vdbcache files retrieved through SRA Cloud Data Delivery. I then ran:

parallel-fastq-dump -s ERR3240193 -t 2 -O out --tmpdir tmp --split-files --gzip

I found the two fastq-dump processes, and ran strace -f -e trace=network -p <pid> on each. I found that the fastq-dump that starts from the start of the SRA file does not use network IO, while the one that starts mid-range does, and was wondering why given I've downloaded the file.

rvalieris commented 3 years ago

hello, can you add which version of fastq-dump are you using ?

if the prefetch completed it should not be using network, in my local tests strace does not give any results for either process, but I don't have much insight into how fastq-dump works internally.

nh13 commented 3 years ago

Here's the versions:

$ parallel-fastq-dump --version
parallel-fastq-dump : 0.6.6

"fastq-dump" version 2.10.9
rvalieris commented 3 years ago

I'm using fastq-dump version 2.10.8, I don't think thats the reason, but you could try downgrading sra-tools.

also I just noticed you mention sra cloud data delivery, you mean this I assume, I have not tested under these conditions, but I guess its possible sra-tools does something different in this case.

either way, to dig further into this I think its best to contact the sra-tools devs, if you havent already.