rvalieris / parallel-fastq-dump

parallel fastq-dump wrapper
MIT License
275 stars 33 forks source link

Question about using parallel fastqdump with prefetch #29

Closed smb20200615 closed 3 years ago

smb20200615 commented 3 years ago

Hi,

I was wondering whether I can run the following to download the data. I know this is possible with fastq-dump. Just wanted to confirm it is also possible with parallel-fastq-dump 0.6.6 .

prefetch [runid] && vdb-validate [runid] && parallel-fastq-dump --outdir [out] --skip-technical --split-3 --sra-id [runid] --gzip

Thank you!

rvalieris commented 3 years ago

hello,

yep that should work, don't forget to set the number of threads on parallel-fastq-dump tho.

smb20200615 commented 3 years ago

Thank you so much for your speedy reply! I get the following message only in certain HPCs. Do you know what the issue can be?

2020-10-26T04:04:38 prefetch.2.9.1 sys: connection busy while validating within network system module - Failed to Make Connection in KClientHttpOpen to 'www.ncbi.nlm.nih.gov:443' 2020-10-26T04:04:38 prefetch.2.9.1 err: path not found while resolving tree within virtual file system module - 'SRR1929563' cannot be found.

Is this an issue with parallel-fastq-dump or prefetch (maybe a vdb-config issue)?

rvalieris commented 3 years ago

this line suggests you are having problems connecting to ncbi servers: connection busy while validating within network system module - Failed to Make Connection in KClientHttpOpen to 'www.ncbi.nlm.nih.gov:443'

it could be a firewall issue, maybe ncbi is blocked somehow ?

smb20200615 commented 3 years ago

Apologies for the subsequent question. With this command:

prefetch [runid] && vdb-validate [runid] && parallel-fastq-dump --outdir [out] --skip-technical --split-3 --sra-id [runid] --gzip

Do you know whether we have to set the path where the SRA data is stored using vdb-config -i ?

rvalieris commented 3 years ago

looks like some behavior was changed recently on SRA tools. please read: https://github.com/ncbi/sra-tools/issues/291 and https://github.com/ncbi/sra-tools/issues/77

smb20200615 commented 3 years ago

Thank you so much for your super useful guidance and speedy replies as always. I am using a quay.io/biocontainers/parallel-fastq-dump with parallel-fastq-dump (version 0.6.6) & fastq-dump (version 2.8.0). So i think it will not be affected by the recent changes. I am just a bit confused because I expected to have to set up the vdb-config but somehow with the container the current directory is used when downloading the SRA data. I wasn't sure whether you have made the container for parallel=fastq-dump and if so if you had somehow automatically configured the sra toolkit.

rvalieris commented 3 years ago

The container just includes the same files you can get with the bioconda packages, no extra config files are included.

as I understand, (before this recent change) without vdb-config configuration it should default to writing the .sra files on the user HOME directory (like ~/ncbi/), but I could be wrong as this seems to change depending on the sra-tools version you have installed.

smb20200615 commented 3 years ago

Thank you so much! And I promise this is my last question. Is there a way to somehow edit the config file via a singularity container? I just don't know how to do so given the containers will not be writable.

rvalieris commented 3 years ago

there's 2 options:

smb20200615 commented 3 years ago

thank you so so much for your guidance!