Closed olgabot closed 2 years ago
SRR1070986
is a protected run.
You have to use --ngc
option to access it.
SRR1070986
is a protected run. You have to use--ngc
option to access it.
Hi @klymenko, thank you -- I realize it is a protected run, and I already have downloaded the SRR files via the "NCBI Cloud Delivery" option into our bucket:
(base)
⚙ Thu 28 Apr - 21:45 /data/fasterq-dump-test
ll */*
Permissions Size User Date Modified Name
.rw-rw-r-- 3.8G olgabot 20 Apr 20:19 SRR1070986/SRR1070986
.rw-rw-r-- 3.8G olgabot 28 Apr 00:17 SRR1070986/SRR1070986.sra
.rw-rw-r-- 704M olgabot 11 Apr 17:57 SRR1070986/SRR1070986.vdbcache
Since I have already downloaded the files, I don't want fasterq-dump
to even try to download them, but instead use the local file. Is this possible?
How did you download it?
Did you use prefetch
? It this case you had to supply ngc file as --ngc
option.
If not - send your request to sra-tools@ncbi.nlm.nih.gov.
Hi Andrew, The SRR files were downloaded using the NCBI Cloud Delivery https://www.ncbi.nlm.nih.gov/sra/docs/data-delivery/ from the NCBI bucket directly into our AWS bucket. We were not able to download the Fastqs directly because they exceeded the 5TB limit for the runs we selected. How do you recommend converting SRR files obtained through the NCBI Cloud Delivery service into fastqs? Thank you! Warmest, Olga
Olga Botvinnik, PhD olgabotvinnik.com http://www.olgabotvinnik.com
On Thu, Apr 28, 2022 at 6:51 PM Andrew Klymenko @.***> wrote:
How did you download it? Did you use prefetch? It this case you had to supply ngc file as --ngc option.
If not - send your request to @.***
— Reply to this email directly, view it on GitHub https://github.com/ncbi/sra-tools/issues/635#issuecomment-1112808875, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGE24FRHBQOACJSLCVUUDDVHM6AXANCNFSM5UTN4MAQ . You are receiving this because you authored the thread.Message ID: @.***>
Each invocation of docker run
creates a new instance of the image that is unrelated to any previous instances of the image. Your vdb-config
command and your subsequent fasterq-dump
happened in different instances of the image, and thus the first command had no effect on the second one.
Here is way to do what you want: First, in an empty directory, create a Dockerfile containing:
FROM ncbi/sra-tools
RUN vdb-config <... the rest of your configuration command ...>
Then run docker build --tag my-sra-tools .
to create your own customized image. When you docker run
your image, your configuration will be active.
Thank you for those suggestions! I'm still not able to get fasterq-dump
to recognize the local file and not try to fetch anything from the server, even after doing vdb-config -s/repository/remote/disabled=true
Here is the Dockerfile:
(base)
Fri 29 Apr - 18:30 /data/fasterq-dump-test
cat Dockerfile
FROM ncbi/sra-tools
RUN vdb-config --root -s/repository/remote/disabled=true%
Here is the docker build
log:
(base)
Fri 29 Apr - 18:31 /data/fasterq-dump-test
docker build -t bridgebio/sra-tools .
Sending build context to Docker daemon 8.387GB
Step 1/2 : FROM ncbi/sra-tools
---> 4c0b31b98aec
Step 2/2 : RUN vdb-config --root -s/repository/remote/disabled=true
---> Using cache
---> dea8d96d988b
Successfully built dea8d96d988b
Successfully tagged bridgebio/sra-tools:latest
No matter whether I use the SRR../
as a folder, or the bare SRR
file, fasterq-dump
keeps erroring out. It seems to be recognizing the local file because I see SRR1070986/SRR1070986.sra is an SRA Normalized Format file with full base quality scores
, but no fastq
files are generated.
Here is the local file structure:
(base)
Fri 29 Apr - 18:38 /data/fasterq-dump-test
ll
Permissions Size User Date Modified Name
.rw-rw-r-- 76 olgabot 29 Apr 18:21 Dockerfile
.rw-rw-r-- 81 olgabot 29 Apr 18:20 Dockerfile~
drwxrwxr-x - olgabot 28 Apr 18:49 SRR1070986
(base)
Fri 29 Apr - 18:38 /data/fasterq-dump-test
ll SRR1070986
Permissions Size User Date Modified Name
.rw-rw-r-- 3.8G olgabot 20 Apr 20:19 SRR1070986
.rw-rw-r-- 3.8G olgabot 28 Apr 00:17 SRR1070986.sra
.rw-rw-r-- 704M olgabot 11 Apr 17:57 SRR1070986.vdbcache
The VDB config seems correct because I see:
<repository>
<remote>
<disabled>true</disabled>
Here is the full config output:
Am I missing something with running fasterq-dump
on local files?
When you docker run
, are you mounting the directory into the container? Like so:
docker run -it -v $PWD:/work:rw -w /work --rm bridgebio/sra-tools ls -l
If that lists the directory with the run file, then
docker run -it -v $PWD:/work:rw -w /work --rm bridgebio/sra-tools fasterq-dump <...>
should find it.
@durbrow Thank you so much, the docker mounted directory was the issue! I now get a new error (yay!) related to the references, which is related to https://github.com/ncbi/sra-tools/issues/202, https://github.com/ncbi/sra-tools/issues/447, https://github.com/ncbi/sra-tools/issues/318:
Here is what I see for the alignment info, which shows the reference information for my accession of interest:
And here is the vdb-dump
output
I need to run this on ~1500 files. I see that I can create a /etc/ncbi/user-settings.kfg
file with pre-specified paths of RefSeq files here: https://github.com/ncbi/sra-tools/issues/416#issuecomment-802946574.
Is it possible to prefetch the references only? I want to make sure I'm downloading the exact version of the genome, with the correct folder/file structure that is necessary for fasterq-dump
. I don't see the option to fetch only references in prefetch --help
, but maybe I'm missing something.
Thank you so much!!
Is it possible to prefetch the references only?
You can prefetch the references for already prefetched run.
If you have just SRR1070986.sra
in SRR1070986
and no refseqs - run prefetch SRR1070986/SRR1070986.sra
@olgabot, did you resolve your issue?
Yes, I needed to have this user-settings.mkfg
file:
/repository/remote/disabled = "true"
# This forces usage of a local refseq folder instead of pulling from NCBI every time
/repository/site/main/archive/apps/refseq/volumes/refseq = "refseq"
/repository/site/main/archive/root = "PWD"
And do some fun file gymnastics in my pipeline code to make it work:
# Combine pipeline-provided plus vdb-configured ncbi settings into one
# This forces usage of a local refseq folder instead of pulling from NCBI every time
sed "s:PWD:\$PWD:" ${ncbi_settings} | cat - \$NCBI_SETTINGS >> new_ncbi_settings.mkfg
echo "\n--- cat NCBI_SETTINGS ---"
cat \$NCBI_SETTINGS
mv ${ncbi_settings} old_ncbi_settings.txt
echo '\n--- cat new_ncbi_settings.mkfg ---'
cat new_ncbi_settings.mkfg
You generate the path to site repository every time. Don't you have a permanent value?
You can create a directory with configuration files (*.kfg
) and export VDB_CONFIG
to point to this directory instead of creating ~/.ncbi/user-settings.mkfg
.
Hello,
Unfortunately, I do not have a permanent value. This is for a Nextflow pipeline on AWS Batch, so each individual pipeline run is run in its own sandboxed environment with a docker container and custom path, so I cannot reference an absolute path. This is the best workaround I've found, as sra-tools
uses only filesystems, while Nextflow can use both filesystems and blob stores, and I couldn't use e.g. an s3://
path in the user-settings.mkfg
file as it must be on a filesystem. The best I can do is to create a local .mkfg
file.
Warmest,
Olga
Hello, Hope you are well. I am using the
ncbi/sra-tools
docker image (thank you for providing it!) to runfasterq-dump
on cloud delivered dbGap controlled access data. Whenever I try to runfasterq-dump
, it always tries to fetch the data remotely, even though the file exists locally. How can I forcefasterq-dump
to ONLY use the local file?I tried using the
vdb-config -s/repository/remote/disabled=true
as mentioned in this issue https://github.com/ncbi/sra-tools/issues/500, but I get the complaint that this command must be run withsudo
, and when I run withsudo
, it doesn't work at all.Here's the
fasterq-dump
version information:`docker run -it ncbi/sra-tools vdb-config` output
``` (base) ✘ ⚙ Thu 28 Apr - 18:48 /data/fasterq-dump-test docker run -it ncbi/sra-tools vdb-config -s/repository/remote/disabled=true 2022-04-28T18:48:25 vdb-config.3.0.0 err: condition violated while updating node - Warning: normally this application should not be run as root/superuser (base) ✘ ⚙ Thu 28 Apr - 18:48 /data/fasterq-dump-test docker run -it ncbi/sra-tools sudo vdb-config -s/repository/remote/disabled=true docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: exec: "sudo": executable file not found in $PATH: unknown. ERRO[0000] error waiting for container: context canceled ```Folder structure of local SRR files
``` (base) ✘ ⚙ Thu 28 Apr - 18:49 /data/fasterq-dump-test ll Permissions Size User Date Modified Name drwxrwxr-x - olgabot 28 Apr 18:49 SRR1070986 (base) ⚙ Thu 28 Apr - 18:49 /data/fasterq-dump-test ll SRR1070986 Permissions Size User Date Modified Name .rw-rw-r-- 3.8G olgabot 20 Apr 20:19 SRR1070986 .rw-rw-r-- 3.8G olgabot 28 Apr 00:17 SRR1070986.sra .rw-rw-r-- 704M olgabot 11 Apr 17:57 SRR1070986.vdbcache ````docker run -it ncbi/sra-tools fasterq-dump` output
### Using `SRR1070986/` folder ``` (base) ✘ ⚙ Thu 28 Apr - 18:53 /data/fasterq-dump-test docker run -it ncbi/sra-tools fasterq-dump --threads 2 --progress -vvv --log-level info SRR1070986 Preference setting is: Prefer SRA Normalized Format files with full base quality scores if available. 2022-04-28T18:53:44 fasterq-dump.3.0.0: KClientHttpOpen - connected from '172.17.0.2' to 169.254.169.254 (169.254.169.254) 2022-04-28T18:53:44 fasterq-dump.3.0.0: KClientHttpOpen - connected from '172.17.0.2' to 169.254.169.254 (169.254.169.254) 2022-04-28T18:53:44 fasterq-dump.3.0.0: KClientHttpOpen - connected from '172.17.0.2' to 169.254.169.254 (169.254.169.254) 2022-04-28T18:53:44 fasterq-dump.3.0.0: KClientHttpOpen - connected from '172.17.0.2' to 169.254.169.254 (169.254.169.254) 2022-04-28T18:53:44 fasterq-dump.3.0.0: KClientHttpOpen - connected from '172.17.0.2' to 169.254.169.254 (169.254.169.254) 2022-04-28T18:53:44 fasterq-dump.3.0.0: KClientHttpOpen - connected from '172.17.0.2' to locate.ncbi.nlm.nih.gov (130.14.29.113) 2022-04-28T18:53:44 fasterq-dump.3.0.0: Setting up SSL/TLS structure 2022-04-28T18:53:44 fasterq-dump.3.0.0: Performing SSL/TLS handshake... 2022-04-28T18:53:44 fasterq-dump.3.0.0: KClientHttpOpen - verifying CA cert 2022-04-28T18:53:44 fasterq-dump.3.0.0: Verifying peer X.509 certificate... 2022-04-28T18:53:44 fasterq-dump.3.0.0: Reading from server... 2022-04-28T18:53:44 fasterq-dump.3.0.0 err: query unauthorized while resolving query within virtual file system module - failed to resolve accession 'SRR1070986' - Access denied - please request permission to access phs000424 / GRU in dbGaP. ( 403 ) Query SRR1070986: Error 403 Access denied - please request permission to access phs000424 / GRU in dbGaP. 2022-04-28T18:53:44 fasterq-dump.3.0.0: Seeding the random number generator 2022-04-28T18:53:44 fasterq-dump.3.0.0: Loading CA root certificates 2022-04-28T18:53:44 fasterq-dump.3.0.0: Parsing text for default CA root certificates 2022-04-28T18:53:44 fasterq-dump.3.0.0: Configuring SSl defaults 2022-04-28T18:53:44 fasterq-dump.3.0.0: KClientHttpOpen - connected from '172.17.0.2' to 169.254.169.254 (169.254.169.254) 2022-04-28T18:53:45 fasterq-dump.3.0.0: KClientHttpOpen - connected from '172.17.0.2' to locate.ncbi.nlm.nih.gov (130.14.29.113) 2022-04-28T18:53:45 fasterq-dump.3.0.0: Setting up SSL/TLS structure 2022-04-28T18:53:45 fasterq-dump.3.0.0: Performing SSL/TLS handshake... 2022-04-28T18:53:45 fasterq-dump.3.0.0: KClientHttpOpen - verifying CA cert 2022-04-28T18:53:45 fasterq-dump.3.0.0: Verifying peer X.509 certificate... 2022-04-28T18:53:45 fasterq-dump.3.0.0: Reading from server... 2022-04-28T18:53:45 fasterq-dump.3.0.0 err: query unauthorized while resolving query within virtual file system module - failed to resolve accession 'SRR1070986' - Access denied - please request permission to access phs000424 / GRU in dbGaP. ( 403 ) fasterq-dump quit with error code 3 ``` ### Using `SRR1070986/SRR1070986` filename ``` (base) ✘ ⚙ Thu 28 Apr - 00:15 /data/fasterq-dump-test docker run -it ncbi/sra-tools fasterq-dump --threads 2 --progress -vvv --log-level info SRR1070986/SRR1070986 Preference setting is: Prefer SRA Normalized Format files with full base quality scores if available. 2022-04-28T00:15:35 fasterq-dump.3.0.0: KClientHttpOpen - connected from '172.17.0.2' to 169.254.169.254 (169.254.169.254) 2022-04-28T00:15:35 fasterq-dump.3.0.0: KClientHttpOpen - connected from '172.17.0.2' to 169.254.169.254 (169.254.169.254) 2022-04-28T00:15:35 fasterq-dump.3.0.0: KClientHttpOpen - connected from '172.17.0.2' to 169.254.169.254 (169.254.169.254) 2022-04-28T00:15:35 fasterq-dump.3.0.0: KClientHttpOpen - connected from '172.17.0.2' to 169.254.169.254 (169.254.169.254) 2022-04-28T00:15:35 fasterq-dump.3.0.0: KClientHttpOpen - connected from '172.17.0.2' to 169.254.169.254 (169.254.169.254) 2022-04-28T00:15:35 fasterq-dump.3.0.0: KClientHttpOpen - connected from '172.17.0.2' to locate.ncbi.nlm.nih.gov (130.14.29.113) 2022-04-28T00:15:35 fasterq-dump.3.0.0: Setting up SSL/TLS structure 2022-04-28T00:15:35 fasterq-dump.3.0.0: Performing SSL/TLS handshake... 2022-04-28T00:15:35 fasterq-dump.3.0.0: KClientHttpOpen - verifying CA cert 2022-04-28T00:15:35 fasterq-dump.3.0.0: Verifying peer X.509 certificate... 2022-04-28T00:15:35 fasterq-dump.3.0.0: Reading from server... 2022-04-28T00:15:35 fasterq-dump.3.0.0 err: name not found while resolving query within virtual file system module - failed to resolve accession 'SRR1070986/SRR1070986' - no data ( 404 ) 2022-04-28T00:15:35 fasterq-dump.3.0.0: Seeding the random number generator 2022-04-28T00:15:35 fasterq-dump.3.0.0: Loading CA root certificates 2022-04-28T00:15:35 fasterq-dump.3.0.0: Parsing text for default CA root certificates 2022-04-28T00:15:35 fasterq-dump.3.0.0: Configuring SSl defaults fasterq-dump quit with error code 3 ``` ### Using `SRR1070986/SRR1070986.sra` filename ``` (base) ⚙ Thu 28 Apr - 00:17 /data/fasterq-dump-test docker run -it ncbi/sra-tools fasterq-dump --threads 2 --progress -vvv --log-level info SRR1070986/SRR1070986.sra Preference setting is: Prefer SRA Normalized Format files with full base quality scores if available. 2022-04-28T18:38:55 fasterq-dump.3.0.0: KClientHttpOpen - connected from '172.17.0.2' to 169.254.169.254 (169.254.169.254) 2022-04-28T18:38:55 fasterq-dump.3.0.0: KClientHttpOpen - connected from '172.17.0.2' to 169.254.169.254 (169.254.169.254) 2022-04-28T18:38:55 fasterq-dump.3.0.0: KClientHttpOpen - connected from '172.17.0.2' to 169.254.169.254 (169.254.169.254) 2022-04-28T18:38:55 fasterq-dump.3.0.0: KClientHttpOpen - connected from '172.17.0.2' to 169.254.169.254 (169.254.169.254) 2022-04-28T18:38:55 fasterq-dump.3.0.0: KClientHttpOpen - connected from '172.17.0.2' to 169.254.169.254 (169.254.169.254) 2022-04-28T18:38:56 fasterq-dump.3.0.0: KClientHttpOpen - connected from '172.17.0.2' to locate.ncbi.nlm.nih.gov (130.14.29.113) 2022-04-28T18:38:56 fasterq-dump.3.0.0: Setting up SSL/TLS structure 2022-04-28T18:38:56 fasterq-dump.3.0.0: Performing SSL/TLS handshake... 2022-04-28T18:38:56 fasterq-dump.3.0.0: KClientHttpOpen - verifying CA cert 2022-04-28T18:38:56 fasterq-dump.3.0.0: Verifying peer X.509 certificate... 2022-04-28T18:38:56 fasterq-dump.3.0.0: Reading from server... 2022-04-28T18:38:57 fasterq-dump.3.0.0: Reading from server... 2022-04-28T18:38:57 fasterq-dump.3.0.0: Reading from server... 2022-04-28T18:38:57 fasterq-dump.3.0.0 err: name not found while resolving query within virtual file system module - failed to resolve accession 'SRR1070986/SRR1070986.sra' - no data ( 404 ) SRR1070986/SRR1070986.sra is an SRA Normalized Format file with full base quality scores. 2022-04-28T18:38:57 fasterq-dump.3.0.0: Seeding the random number generator 2022-04-28T18:38:57 fasterq-dump.3.0.0: Loading CA root certificates 2022-04-28T18:38:57 fasterq-dump.3.0.0: Parsing text for default CA root certificates 2022-04-28T18:38:57 fasterq-dump.3.0.0: Configuring SSl defaults fasterq-dump quit with error code 3 ```I would greatly appreciate your help! Thank you so much. Warmest, Olga