sudmantlab / sudmanthelp

sudmant lab first-aid kit with useful utility functions and common issues
0 stars 0 forks source link

sra toolkit prefetch not downloading #1

Open shenghuanjie opened 5 years ago

shenghuanjie commented 5 years ago

prefetch some some files failed but not others For exmaple: prefetch SRR8231713 failed

2019-08-30T02:21:23 prefetch.2.9.6: heartbeat = 60000 Milliseconds
2019-08-30T02:21:23 prefetch.2.9.6: Seeding the random number generator
2019-08-30T02:21:23 prefetch.2.9.6: Loading CA root certificates
2019-08-30T02:21:23 prefetch.2.9.6: Listing CA root certificates
2019-08-30T02:21:23 prefetch.2.9.6: Retrieving names in CA root certificates
2019-08-30T02:21:23 prefetch.2.9.6: Configuring SSl defaults

2019-08-30T02:21:23 prefetch.2.9.6: 'tools/ascp/disabled': not found in configuration
2019-08-30T02:21:23 prefetch.2.9.6: Checking 'ascp'
2019-08-30T02:21:23 prefetch.2.9.6: 'ascp': not found
2019-08-30T02:21:23 prefetch.2.9.6: Checking 'ascp'
2019-08-30T02:21:23 prefetch.2.9.6: 'ascp': not found
2019-08-30T02:21:23 prefetch.2.9.6: Checking '/usr/bin/ascp'
2019-08-30T02:21:23 prefetch.2.9.6: '/usr/bin/ascp': not found
2019-08-30T02:21:23 prefetch.2.9.6: Checking '/usr/bin/ascp'
2019-08-30T02:21:23 prefetch.2.9.6: '/usr/bin/ascp': not found
2019-08-30T02:21:23 prefetch.2.9.6: Checking '/opt/aspera/bin/ascp'
2019-08-30T02:21:23 prefetch.2.9.6: '/opt/aspera/bin/ascp': not found
2019-08-30T02:21:23 prefetch.2.9.6: Checking '/opt/aspera/bin/ascp'
2019-08-30T02:21:23 prefetch.2.9.6: '/opt/aspera/bin/ascp': not found
2019-08-30T02:21:23 prefetch.2.9.6: Checking '/global/home/users/shenghuanjie/.aspera/connect/bin/ascp'
2019-08-30T02:21:23 prefetch.2.9.6: '/global/home/users/shenghuanjie/.aspera/connect/bin/ascp': not found
2019-08-30T02:21:23 prefetch.2.9.6: Checking '/global/home/users/shenghuanjie/.aspera/connect/bin/ascp'
2019-08-30T02:21:23 prefetch.2.9.6: '/global/home/users/shenghuanjie/.aspera/connect/bin/ascp': not found
2019-08-30T02:21:23 prefetch.2.9.6: KClientHttpOpen - opening socket to www.ncbi.nlm.nih.gov:443
2019-08-30T02:21:23 prefetch.2.9.6: www.ncbi.nlm.nih.gov resolved to 130.14.29.110
2019-08-30T02:21:23 prefetch.2.9.6: KClientHttpOpen - connected from '169.229.176.18' to www.ncbi.nlm.nih.gov (130.14.29.110)
2019-08-30T02:21:23 prefetch.2.9.6: KClientHttpOpen - creating TLS wrapper on socket
2019-08-30T02:21:23 prefetch.2.9.6: KTLSStreamMake
2019-08-30T02:21:23 prefetch.2.9.6: KTLSStreamMake - initializing KStream
2019-08-30T02:21:23 prefetch.2.9.6: KTLSStreamMake - initializing tls wrapper
2019-08-30T02:21:23 prefetch.2.9.6: Setting up SSL/TLS structure
2019-08-30T02:21:23 prefetch.2.9.6: Performing SSL/TLS handshake...
2019-08-30T02:21:24 prefetch.2.9.6: KClientHttpOpen - verifying CA cert
2019-08-30T02:21:24 prefetch.2.9.6: Verifying peer X.509 certificate...
2019-08-30T02:21:24 prefetch.2.9.6: KClientHttpOpen - extracting TLS wrapper as stream
2019-08-30T02:21:24 prefetch.2.9.6: KClientHttpOpen - setting port number - 443
2019-08-30T02:21:24 prefetch.2.9.6: Writing 256 bytes to to server
2019-08-30T02:21:24 prefetch.2.9.6: 256 bytes written
2019-08-30T02:21:24 prefetch.2.9.6: Reading from server...
2019-08-30T02:21:24 prefetch.2.9.6: 693 bytes read
2019-08-30T02:21:24 prefetch.2.9.6: ########## Resolve(SRR8231713) = RC(rcVFS,rcQuery,rcExecuting,rcSelf,rcNull):
2019-08-30T02:21:24 prefetch.2.9.6: local(NULL)
2019-08-30T02:21:24 prefetch.2.9.6: cache(NULL)
2019-08-30T02:21:24 prefetch.2.9.6: remote(NULL:0)

https://github.com/ncbi/sra-tools/issues/196

https://github.com/ncbi/sra-tools/issues/197

shenghuanjie commented 5 years ago

The run you are having issue with was submitted to only be used via cloud storage and cannot be prefetched.

Source data files for some of the latest GTEx submissions are being stored and distributed from cloud providers in their originally submitted format. These files are accessioned in the same format as data submitted for processing at NCBI. If index files for BAM, CRAM, and VCF data are provided in the submission, they will be distributed with the source data under the same accession. The content of the source data files may not be verified by NCBI and some data types may require additional files or software to access.

Data files located on a cloud provider can be found using the DATASTORE provider, location, and filetype in the SRA Run Selector. For most data sets the files are only accessible using the same DATASTORE provider and DATASTORE region.

URL access

Data can also be accessed via URL or signed URL for dbGaP data. Signed URLs are only issued for dbGaP protected data and will require using the project's repository key (.ngc) to be provided when requesting the URL. The SRA Data Locator (SDL) can be accessed at https://www.ncbi.nlm.nih.gov/Traces/sdl/1/ and used to find the data location and get URLs to access data in the cloud. Signed URLs have a limited period they are valid for and will need to be refreshed.

fusera

https://github.com/mitre/fusera

MITRE has built software that will allow users to access files and make them appear "locally" available in your cloud terminal without the need to copy or store the data files in the user's cloud account. fusera uses FUSE to provide access for SRA accessioned data stored in cloud archives while avoiding the expense for storage or egress. The data files will appear in a directory by Run (SRR) accession.

To use fusera, first install the package from https://github.com/mitre/fusera on your cloud provider. Currently there is data available and support for Google and Amazon cloud services but data is only available in the region and provider used for storage.

For dbGaP studies a copy of the repository key (.ngc) from dbGaP Authorized Access must be available on your cloud account.

To mount cloud storage data use the fusera 'mount' program to access one or more accessions. Remember to run 'mount' as a background process and to 'unmount' the directory when finished using the data.

shenghuanjie commented 5 years ago

Some files are only available on the cloud so we cannot download them.

shenghuanjie commented 5 years ago

Because the data is only available on AWS S3 storage, you have to get on the cloud and mount the data. It should be noted that data can only be accessed by EC2 instances launched at the same location. For instance, SRR8231713 is on available at us-east-1. That is, you have to create a EC2 instance at us-east-1 to mount the data following the instruction above.