rvalieris / parallel-fastq-dump

parallel fastq-dump wrapper
MIT License
275 stars 33 forks source link

Download all reads for given experiment accession #40

Closed snayfach closed 3 years ago

snayfach commented 3 years ago

Thanks for the great tool! As an enhancement, it would be great if I could give the program an experiment (or sample) accession and download reads for all run accessions that correspond to the experiment. This is a feature of the standard fastq-dump program.

rvalieris commented 3 years ago

hello, can you give me some examples of accessions with multiple runs ?

snayfach commented 3 years ago

Here's one example, let me know if you need more: Experiment accession = SRX023998 Sample accession = SRS014494 Run accessions = SRR061179, SRR061180

Thanks!

rvalieris commented 3 years ago

and that works with fastq-dump ? which version are you using ?

with version 2.11.0, I get this message:

$ fastq-dump SRX023998
SRX023998 is not a run accession. For more information, see https://www.ncbi.nlm.nih.gov/sra/?term=SRX023998
Automatic expansion of container accessions is not currently available. See the above link(s) for information about the accessions.
snayfach commented 3 years ago

version 2.9.1. This works for me: fastq-dump --split-3 --gzip -O SRX003365 SRX003365

But strangely the other accession returns:

2021-06-23T15:45:55 fastq-dump.2.9.1 err: item not found while constructing within virtual database module - the path 'SRX023998' cannot be opened as database or table

I remember being able to download experiments containing multiple runs, but that one example is not working now for some reason. Any ideas?

rvalieris commented 3 years ago

I don't know, my guess is that this was not really supported but it kinda worked for some SRX/SRS in older versions, and now on the newer version they are properly detecting this and printing an error message.

I think its better not to depend on this behavior even if it appears to work, and if you downloaded a SRX like this in the past make sure the final fastq really contains the sum of all the SRRs reads.