Closed eitanyaffe closed 2 years ago
Are you still affected by this? I was told it was a temporary condition.
Thanks for looking into this! I just checked (on version 2.11.3) and the problem persists.
Turns out that if I disable the 'report cloud instance identity' flag through vdb-config I manage to download files. However, they seem to be downloaded from Amazon instead of from buckets:
sa_113045741138414598170@instance-1:~$ fasterq-dump -v -v SRR000001
Preference setting is: Prefer SRA Normalized Format files with full base quality scores if available.
2022-02-17T20:31:24 fasterq-dump.2.11.3: KClientHttpOpen - connected from '10.128.0.14' to metadata.google.internal (169.254.169.254)
2022-02-17T20:31:24 fasterq-dump.2.11.3: KClientHttpOpen - connected from '10.128.0.14' to locate.ncbi.nlm.nih.gov (130.14.29.113)
2022-02-17T20:31:24 fasterq-dump.2.11.3: KClientHttpOpen - verifying CA cert
SRR000001 is an SRA Normalized Format file with full base quality scores.
2022-02-17T20:31:24 fasterq-dump.2.11.3: KClientHttpOpen - connected from '10.128.0.14' to sra-pub-run-odp.s3.amazonaws.com (54.231.133.105)
2022-02-17T20:31:24 fasterq-dump.2.11.3: KClientHttpOpen - verifying CA cert
2022-02-17T20:31:25 fasterq-dump.2.11.3: KClientHttpOpen - connected from '10.128.0.14' to sra-pub-run-odp.s3.amazonaws.com (54.231.133.105)
2022-02-17T20:31:25 fasterq-dump.2.11.3: KClientHttpOpen - verifying CA cert
2022-02-17T20:31:25 fasterq-dump.2.11.3: KClientHttpOpen - connected from '10.128.0.14' to sra-pub-run-odp.s3.amazonaws.com (54.231.133.105)
2022-02-17T20:31:25 fasterq-dump.2.11.3: KClientHttpOpen - verifying CA cert
2022-02-17T20:31:25 fasterq-dump.2.11.3: KClientHttpOpen - connected from '10.128.0.14' to sra-pub-run-odp.s3.amazonaws.com (54.231.133.105)
2022-02-17T20:31:25 fasterq-dump.2.11.3: KClientHttpOpen - connected from '10.128.0.14' to sra-pub-run-odp.s3.amazonaws.com (54.231.133.105)
2022-02-17T20:31:25 fasterq-dump.2.11.3: KClientHttpOpen - connected from '10.128.0.14' to sra-pub-run-odp.s3.amazonaws.com (54.231.133.105)
2022-02-17T20:31:25 fasterq-dump.2.11.3: KClientHttpOpen - connected from '10.128.0.14' to sra-pub-run-odp.s3.amazonaws.com (54.231.133.105)
2022-02-17T20:31:25 fasterq-dump.2.11.3: KClientHttpOpen - connected from '10.128.0.14' to sra-pub-run-odp.s3.amazonaws.com (54.231.133.105)
2022-02-17T20:31:25 fasterq-dump.2.11.3: KClientHttpOpen - connected from '10.128.0.14' to sra-pub-run-odp.s3.amazonaws.com (54.231.133.105)
2022-02-17T20:31:25 fasterq-dump.2.11.3: KClientHttpOpen - verifying CA cert
2022-02-17T20:31:25 fasterq-dump.2.11.3: KClientHttpOpen - verifying CA cert
2022-02-17T20:31:25 fasterq-dump.2.11.3: KClientHttpOpen - verifying CA cert
2022-02-17T20:31:25 fasterq-dump.2.11.3: KClientHttpOpen - verifying CA cert
2022-02-17T20:31:25 fasterq-dump.2.11.3: KClientHttpOpen - verifying CA cert
2022-02-17T20:31:26 fasterq-dump.2.11.3: KClientHttpOpen - verifying CA cert
2022-02-17T20:31:31 fasterq-dump.2.11.3: KClientHttpOpen - connected from '10.128.0.14' to sra-pub-run-odp.s3.amazonaws.com (52.216.99.19)
With the 'report cloud instance identity' flag enabled I get this:
sa_113045741138414598170@instance-1:~$ fasterq-dump -v -v SRR000001
Preference setting is: Prefer SRA Normalized Format files with full base quality scores if available.
2022-02-17T20:34:12 fasterq-dump.2.11.3: KClientHttpOpen - connected from '10.128.0.14' to metadata (169.254.169.254)
2022-02-17T20:34:12 fasterq-dump.2.11.3: KClientHttpOpen - connected from '10.128.0.14' to metadata (169.254.169.254)
2022-02-17T20:34:12 fasterq-dump.2.11.3: KClientHttpOpen - connected from '10.128.0.14' to locate.ncbi.nlm.nih.gov (130.14.29.113)
2022-02-17T20:34:12 fasterq-dump.2.11.3: KClientHttpOpen - verifying CA cert
SRR000001 is an SRA Normalized Format file with full base quality scores.
2022-02-17T20:34:12 fasterq-dump.2.11.3: KClientHttpOpen - connected from '10.128.0.14' to locate.ncbi.nlm.nih.gov (130.14.29.113)
2022-02-17T20:34:12 fasterq-dump.2.11.3: KClientHttpOpen - verifying CA cert
2022-02-17T20:34:12 fasterq-dump.2.11.3: KClientHttpOpen - connected from '10.128.0.14' to locate.ncbi.nlm.nih.gov (130.14.29.113)
2022-02-17T20:34:13 fasterq-dump.2.11.3: KClientHttpOpen - verifying CA cert
2022-02-17T20:34:13 fasterq-dump.2.11.3: KClientHttpOpen - connected from '10.128.0.14' to locate.ncbi.nlm.nih.gov (130.14.29.113)
2022-02-17T20:34:13 fasterq-dump.2.11.3: KClientHttpOpen - verifying CA cert
2022-02-17T20:34:13 fasterq-dump.2.11.3: KClientHttpOpen - connected from '10.128.0.14' to locate.ncbi.nlm.nih.gov (130.14.29.113)
2022-02-17T20:34:13 fasterq-dump.2.11.3: KClientHttpOpen - verifying CA cert
2022-02-17T20:34:13 fasterq-dump.2.11.3: KClientHttpOpen - connected from '10.128.0.14' to locate.ncbi.nlm.nih.gov (130.14.29.113)
2022-02-17T20:34:13 fasterq-dump.2.11.3: KClientHttpOpen - verifying CA cert
2022-02-17T20:34:13 fasterq-dump.2.11.3: KClientHttpOpen - connected from '10.128.0.14' to locate.ncbi.nlm.nih.gov (130.14.29.113)
2022-02-17T20:34:13 fasterq-dump.2.11.3: KClientHttpOpen - verifying CA cert
2022-02-17T20:34:13 fasterq-dump.2.11.3: KClientHttpOpen - connected from '10.128.0.14' to locate.ncbi.nlm.nih.gov (130.14.29.113)
2022-02-17T20:34:13 fasterq-dump.2.11.3: KClientHttpOpen - verifying CA cert
2022-02-17T20:34:13 fasterq-dump.2.11.3 err: transfer incomplete while reading file within network system module - error with https open 'https://locate.ncbi.nlm.nih.gov/sdlr/sdlr.fcgi?jwt=eyJhbGciOiJSUzI1NiIsImtpZCI6InNkbHJraWQxIiwidHlwIjoiSldUIn0.eyJhY2MiOiJTUlIwMDAwMDEiLCJleHAiOjE2NDUxNDQ0NTIsImZpbGVTaXplIjoiMzEyNTI3MDgzIiwiaWF0IjoxNjQ1MTMwMDUyLCJqdGkiOiI1YzA2NjJmOS0yYWUzLTQyMmEtODJmYy0xNWNlNzI4YmNhNTUiLCJsaW5rIjoiaHR0cHM6Ly9zdG9yYWdlLmdvb2dsZWFwaXMuY29tL3NyYS1wdWItcnVuLTcvU1JSMDAwMDAxL1NSUjAwMDAwMS40IiwicmVnaW9uIjoidXMiLCJzZXJ2aWNlIjoiZ3MiLCJzaWduaW5nQWNjb3VudCI6InNyYV9ncyIsInRpbWVvdXQiOjE0NDAwfQ.E7mQlKLM0I_f_Vk2lSg4ghyLFjFgHorBCj9BUYArkfUIwW_hHlYnwUsho7fQAN7McVWgKEz4VqEv6RsECLSJuhEPcYAEglXIpr3yafJEGrvyuKQMIW5saRFhvd0VH-72xzm96QAsMBbFsG1FcR43mFDZ6Ts8uHhmr3L1VwUTA58w6hW0F8Tgv3eADwdNSbqzSQROU6YK3X4bMf_AbZ06D_8qZ0bmV5o6Z2MwMvo2VGuC8Emnz_CQYORWMZitc61iYzZe4x1cbLXT9NLUD1wcdWzxqdndk3K2dnLDm1mk54wfZkPpc8lUu9G7eqJR_sEuPY03ZKlBBi9GjKkfDaxbYQ&ncbi_phid=D0BD0BE048EAE375000051D7BC74186D.1.1'
2022-02-17T20:34:13 fasterq-dump.2.11.3 err: invalid accession 'SRR000001'
fasterq-dump quit with error code 3
To my understanding the downloading from GCP buckets is failing. This important feature can save funds and time, and is critical when scaling up workflows.
Let me know if I can assist further in any way.
Eitan
We are investigating this issue.
Meanwhile run vdb-config -i
, uncheck report cloud instance identity
and rerun your command.
Thanks Andrew! That works, and is a good solution until this issue is resolved.
I have a possible similar problem with fasterq-dump
. Here are the specifics:
report cloud instance identity
did not make a difference. starting with an empty config file did not make a differenceTest:
fasterq-dump -L debug --ngc /prj_phs710EA_test.ngc -O /data/$USER/temp/test SRR1219902
Output without debug logging (failure always occurs after approx 10m)
2022-03-21T15:58:04 fasterq-dump.3.0.0 err: data unexpected while reading file within network system module - Cannot KStblHttpFileTimedReadChunked
2022-03-21T16:07:51 fasterq-dump.3.0.0 fatal: SIGNAL - 8
fasterq-dump quit with error code 1
Output:
2022-03-21T16:59:31 fasterq-dump.3.0.0 debug: path not found while opening node within configuration module - no image guid
2022-03-21T16:59:31 fasterq-dump.3.0.0 debug: path not found while opening node within configuration module - no image guid
2022-03-21T16:59:32 fasterq-dump.3.0.0 debug: requesting guard size 1038336, default was 4096
2022-03-21T16:59:32 fasterq-dump.3.0.0 info: HTTP read failure: URL="" status=403; tried 1/6 times for 0 milliseconds total
2022-03-21T16:59:32 fasterq-dump.3.0.0 info: HTTP read failure: URL="" status=403; tried 2/6 times for 5 milliseconds total
2022-03-21T16:59:32 fasterq-dump.3.0.0 info: HTTP read failure: URL="" status=403; tried 3/6 times for 15 milliseconds total
2022-03-21T16:59:32 fasterq-dump.3.0.0 info: HTTP read failure: URL="" status=403; tried 4/6 times for 30 milliseconds total
2022-03-21T16:59:32 fasterq-dump.3.0.0 info: HTTP read failure: URL="" status=403; tried 5/6 times for 60 milliseconds total
2022-03-21T16:59:32 fasterq-dump.3.0.0 info: HTTP read failure: URL="" status=403; tried 6/6 times for 120 milliseconds total
2022-03-21T16:59:33 fasterq-dump.3.0.0 info: HTTP read failure: URL="https://sra-ca-run-odp.s3.amazonaws.com/sra/phs000710.c99/SRR1219902/SRR1219902?X-Amz
-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIA6PM54Q3MA6LQ4LOY%2F20220321%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20220321T165932Z&X-Amz-Expires=360
000&X-Amz-SignedHeaders=host&ncbi_phid=939B8C751BDA84D50000327D410FF938.1.1&project_id=0&x-amz-request-payer=requester&X-Amz-Signature=41857e41bb7989fba5a
34cd48f6c946b6f130fe3ec77101a624dcf84b856b3da" status=403; tried 1/6 times for 0 milliseconds total
2022-03-21T16:59:33 fasterq-dump.3.0.0 info: HTTP read failure: URL="https://sra-ca-run-odp.s3.amazonaws.com/sra/phs000710.c99/SRR1219902/SRR1219902?X-Amz
-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIA6PM54Q3MA6LQ4LOY%2F20220321%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20220321T165932Z&X-Amz-Expires=360
000&X-Amz-SignedHeaders=host&ncbi_phid=939B8C751BDA84D50000327D410FF938.1.1&project_id=0&x-amz-request-payer=requester&X-Amz-Signature=41857e41bb7989fba5a
34cd48f6c946b6f130fe3ec77101a624dcf84b856b3da" status=403; tried 2/6 times for 5 milliseconds total
2022-03-21T16:59:33 fasterq-dump.3.0.0 info: HTTP read failure: URL="https://sra-ca-run-odp.s3.amazonaws.com/sra/phs000710.c99/SRR1219902/SRR1219902?X-Amz
-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIA6PM54Q3MA6LQ4LOY%2F20220321%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20220321T165932Z&X-Amz-Expires=360
000&X-Amz-SignedHeaders=host&ncbi_phid=939B8C751BDA84D50000327D410FF938.1.1&project_id=0&x-amz-request-payer=requester&X-Amz-Signature=41857e41bb7989fba5a
34cd48f6c946b6f130fe3ec77101a624dcf84b856b3da" status=403; tried 3/6 times for 15 milliseconds total
2022-03-21T16:59:34 fasterq-dump.3.0.0 info: HTTP read failure: URL="https://sra-ca-run-odp.s3.amazonaws.com/sra/phs000710.c99/SRR1219902/SRR1219902?X-Amz
-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIA6PM54Q3MA6LQ4LOY%2F20220321%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20220321T165932Z&X-Amz-Expires=360
000&X-Amz-SignedHeaders=host&ncbi_phid=939B8C751BDA84D50000327D410FF938.1.1&project_id=0&x-amz-request-payer=requester&X-Amz-Signature=41857e41bb7989fba5a
34cd48f6c946b6f130fe3ec77101a624dcf84b856b3da" status=403; tried 4/6 times for 30 milliseconds total
2022-03-21T16:59:34 fasterq-dump.3.0.0 info: HTTP read failure: URL="https://sra-ca-run-odp.s3.amazonaws.com/sra/phs000710.c99/SRR1219902/SRR1219902?X-Amz
-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIA6PM54Q3MA6LQ4LOY%2F20220321%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20220321T165932Z&X-Amz-Expires=360
000&X-Amz-SignedHeaders=host&ncbi_phid=939B8C751BDA84D50000327D410FF938.1.1&project_id=0&x-amz-request-payer=requester&X-Amz-Signature=41857e41bb7989fba5a
34cd48f6c946b6f130fe3ec77101a624dcf84b856b3da" status=403; tried 5/6 times for 60 milliseconds total
2022-03-21T16:59:34 fasterq-dump.3.0.0 info: HTTP read failure: URL="https://sra-ca-run-odp.s3.amazonaws.com/sra/phs000710.c99/SRR1219902/SRR1219902?X-Amz
-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIA6PM54Q3MA6LQ4LOY%2F20220321%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20220321T165932Z&X-Amz-Expires=360
000&X-Amz-SignedHeaders=host&ncbi_phid=939B8C751BDA84D50000327D410FF938.1.1&project_id=0&x-amz-request-payer=requester&X-Amz-Signature=41857e41bb7989fba5a
34cd48f6c946b6f130fe3ec77101a624dcf84b856b3da" status=403; tried 6/6 times for 120 milliseconds total
2022-03-21T16:59:35 fasterq-dump.3.0.0 info: HTTP read failure: URL="https://sra-ca-run-odp.s3.amazonaws.com/sra/phs000710.c99/SRR1219902/SRR1219902?X-Amz
-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIA6PM54Q3MA6LQ4LOY%2F20220321%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20220321T165932Z&X-Amz-Expires=360
000&X-Amz-SignedHeaders=host&ncbi_phid=939B8C751BDA84D50000327D410FF938.1.1&project_id=0&x-amz-request-payer=requester&X-Amz-Signature=41857e41bb7989fba5a
34cd48f6c946b6f130fe3ec77101a624dcf84b856b3da" status=403; tried 1/6 times for 0 milliseconds total
[...much more...]
on the same machine, when i do a curl --trace-ascii ascii.trace
on one of the urls above I get a reasonable looking interaction:
== Info: About to connect() to proxy dtn01-e0 port 3128 (#0)
== Info: Trying 10.1.200.237...
== Info: Connected to dtn01-e0 (10.1.200.237) port 3128 (#0)
== Info: Establish HTTP proxy tunnel to sra-ca-run-odp.s3.amazonaws.com:443
=> Send header, 154 bytes (0x9a)
0000: CONNECT sra-ca-run-odp.s3.amazonaws.com:443 HTTP/1.1
0036: Host: sra-ca-run-odp.s3.amazonaws.com:443
0061: User-Agent: curl/7.29.0
007a: Proxy-Connection: Keep-Alive
0098:
<= Recv header, 37 bytes (0x25)
0000: HTTP/1.1 200 Connection established
<= Recv header, 2 bytes (0x2)
0000:
== Info: Proxy replied OK to CONNECT request
== Info: Initializing NSS with certpath: sql:/etc/pki/nssdb
== Info: CAfile: /etc/pki/tls/certs/ca-bundle.crt
CApath: none
== Info: SSL connection using TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
== Info: Server certificate:
== Info: subject: CN=*.s3.amazonaws.com,O="Amazon.com, Inc.",L=Seattle,ST=Washington,C=US
== Info: start date: Dec 13 00:00:00 2021 GMT
== Info: expire date: Dec 13 23:59:59 2022 GMT
== Info: common name: *.s3.amazonaws.com
== Info: issuer: CN=DigiCert Baltimore CA-2 G2,OU=www.digicert.com,O=DigiCert Inc,C=US
=> Send header, 493 bytes (0x1ed)
0000: GET /sra/phs000710.c99/SRR1219902/SRR1219902?X-Amz-Algorithm=AWS
0040: 4-HMAC-SHA256&X-Amz-Credential=AKIA6PM54Q3MA6LQ4LOY%2F20220321%2
0080: Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20220321T170539Z&X-Amz
00c0: -Expires=360000&X-Amz-SignedHeaders=host&ncbi_phid=939B8C751BDA8
0100: 4D500002F8149EE8E92.1.1&project_id=0&x-amz-request-payer=request
0140: er&X-Amz-Signature=ce160a28451d31b0a4a6c95bfe6fa9613df581232414e
0180: 22b6bd7a32a97981a0c HTTP/1.1
019e: User-Agent: curl/7.29.0
01b7: Host: sra-ca-run-odp.s3.amazonaws.com
01de: Accept: */*
01eb:
<= Recv header, 17 bytes (0x11)
0000: HTTP/1.1 200 OK
<= Recv header, 90 bytes (0x5a)
0000: x-amz-id-2: Q1wGTX31cYvCyTpiQe9vg5lvbK0RYtihMxC2qHMHtmgtE3ZHUX8Q
0040: qxu6iWPKL8eccK3iX148x1s=
<= Recv header, 36 bytes (0x24)
0000: x-amz-request-id: R2PR423HKWJF8MAA
<= Recv header, 37 bytes (0x25)
0000: Date: Mon, 21 Mar 2022 17:53:22 GMT
<= Recv header, 46 bytes (0x2e)
0000: Last-Modified: Thu, 20 Jan 2022 23:19:59 GMT
<= Recv header, 47 bytes (0x2f)
0000: ETag: "592a94fd99b555b1180d000b916d913d-1773"
<= Recv header, 24 bytes (0x18)
0000: x-amz-tagging-count: 1
<= Recv header, 22 bytes (0x16)
0000: Accept-Ranges: bytes
<= Recv header, 35 bytes (0x23)
0000: Content-Type: binary/octet-stream
<= Recv header, 18 bytes (0x12)
0000: Server: AmazonS3
<= Recv header, 29 bytes (0x1d)
0000: Content-Length: 14868512697
<= Recv header, 2 bytes (0x2)
0000:
<= Recv data, 1645 bytes (0x66d)
0000: NCBI.sra.........'...........'......O.q...lock,.ES....$......md.
0040: .ES....m........"......cur..ES....$.....&......`!........md5..ES
0080: ....$............).........tbl..DS....m.........'....*.....PRIMA
00c0: RY_ALIGNMENT..ES....m....................col..ES....m........p..
0100: ...*.P.x.........&.K...GLOBAL_REF_START..ES....m.............#EX
0140: {....data..ES....$......$......4Z.......idx..ES....$............
[...much more...]
@wresch, send questions regarding access to protected data to sra-tools@ncbi.nlm.nih.gov
OK - I'll try that. Thanks @klymenko
@eitanyaffe, the issue id fixed. Please verify.
@klymenko -- I checked and it works well. Thank you so much for resolving this issue!
We are investigating this issue. Meanwhile run
vdb-config -i
, uncheckreport cloud instance identity
and rerun your command.
Thank you! This worked for me!
My commands were prefetch -v SRR000001
followed by fastq-dump -v SRR000001
Hi,
I'm trying to use the SRA Toolkit to download data onto a Google Cloud VM. I created a strong VM (n2-standard-32) and I installed the toolkit with version 2.11.2.
I get the following error:
It keeps in trying, adding longer and longer sleeps (1s, 2s, ...). Finally it fails completely.
srapath gives this info:
fasterq-dump also failed with a slightly different error message:
Note that this issue occurs in older and newer versions of the toolkit (2.11.1 and 2.11.3). Alternatives I've tried (with similar results) include running prefetch through the NCBI docker image:
Let me know if there is any details you might need to reproduce/solve this bug. Alternatively, please refer me to instructions on how to download data within a GCP VM.
SMALL UPDATE ADDED 2/10. To reproduce the problem you can follow the instructions on https://edwards.flinders.edu.au/accessing-sra-in-the-cloud/. I managed to follow through the entire page (same with versions 2.11.3 and 2.10.2) and got the same results as the author of that page. But when running fastq-dump I get there is error:
Best, Eitan Yaffe