Open shenghuanjie opened 5 years ago
The run you are having issue with was submitted to only be used via cloud storage and cannot be prefetched.
Source data files for some of the latest GTEx submissions are being stored and distributed from cloud providers in their originally submitted format. These files are accessioned in the same format as data submitted for processing at NCBI. If index files for BAM, CRAM, and VCF data are provided in the submission, they will be distributed with the source data under the same accession. The content of the source data files may not be verified by NCBI and some data types may require additional files or software to access.
Data files located on a cloud provider can be found using the DATASTORE provider, location, and filetype in the SRA Run Selector. For most data sets the files are only accessible using the same DATASTORE provider and DATASTORE region.
URL access
Data can also be accessed via URL or signed URL for dbGaP data. Signed URLs are only issued for dbGaP protected data and will require using the project's repository key (.ngc) to be provided when requesting the URL. The SRA Data Locator (SDL) can be accessed at https://www.ncbi.nlm.nih.gov/Traces/sdl/1/ and used to find the data location and get URLs to access data in the cloud. Signed URLs have a limited period they are valid for and will need to be refreshed.
fusera
https://github.com/mitre/fusera
MITRE has built software that will allow users to access files and make them appear "locally" available in your cloud terminal without the need to copy or store the data files in the user's cloud account. fusera uses FUSE to provide access for SRA accessioned data stored in cloud archives while avoiding the expense for storage or egress. The data files will appear in a directory by Run (SRR) accession.
To use fusera, first install the package from https://github.com/mitre/fusera on your cloud provider. Currently there is data available and support for Google and Amazon cloud services but data is only available in the region and provider used for storage.
For dbGaP studies a copy of the repository key (.ngc) from dbGaP Authorized Access must be available on your cloud account.
To mount cloud storage data use the fusera 'mount' program to access one or more accessions. Remember to run 'mount' as a background process and to 'unmount' the directory when finished using the data.
Some files are only available on the cloud so we cannot download them.
Because the data is only available on AWS S3 storage, you have to get on the cloud and mount the data. It should be noted that data can only be accessed by EC2 instances launched at the same location. For instance, SRR8231713 is on available at us-east-1. That is, you have to create a EC2 instance at us-east-1 to mount the data following the instruction above.
prefetch some some files failed but not others For exmaple:
prefetch SRR8231713
failedhttps://github.com/ncbi/sra-tools/issues/196
https://github.com/ncbi/sra-tools/issues/197