Closed tgbrooks closed 2 weeks ago
NC_000001.10 cannot be resolved by SRA data locator service. Please send email to sra@ncbi.nlm.nih.gov.
Thanks for that suggestion, I have emailed them. Are you able to explain what's happening a little? Is prefetch trying to download the entire human genome too? If so, will it be doing this for every single sample I try to download? Is it caching it somewhere? I might need to configure that so that it works properly on my cluster, but I see no command line options to do so. I'm also confused by the vague documentation. I see vdb-config
has a CACHE tab that lists "location of user-repository" but the documentation suggests that the prefetch -O
option overwrites that. That makes it sound more like an output location than a cache location. If I want to download each sample to a different location but all use the same cache location, is that possible? And if I run some in parallel, would the caches just clobber each other?
If you know any specific guidance or documentation on how all this behaves, please point me to it. Thanks.
How many runs do you need to download?
Right now, about 20. But I'm trying to understand what it's doing so that I don't do something dumb when I go to download more in the future.
For anyone else getting this error: it goes away without the -O
option, though then you can't redirect where it outputs. It also downloads a bunch of other files like NC_000001.10
into the SRR directory, apparently one per chromosome, about 1.5gb total size.
What is the output of:
curl -H "X-SRA-Release: 3.1.0" -H "X-VDB-Release: 3.1.0" -X POST -d "acc=NC_000001.10&accept-proto=http,https&capability=allow-cloud-refseq,ZQA:R" https://locate.ncbi.nlm.nih.gov/sdl/2/retrieve
{"version": "2","result": [{"bundle": "NC_000001.10","status": 200,"msg": "ok","files": [{"object": "refseq|NC_000001.10","accession": "NC_000001.10","type": "sra","name": "NC_000001.10","size": 56636642,"md5": "4a38084e69f2c266d32f6509c288bccb","modificationDate": "2018-02-22T18:21:00Z","locations": [{"service": "sra-ncbi","region": "public","link": "https://sra-download.ncbi.nlm.nih.gov/traces/refseq/NC_000001.10"}]}]}]}
Please run prefetch SRR2078863 -O test
again.
Does it succeed to download references?
Yes, it worked now. Thank you!
I am getting the following error when running:
prefetch SRR2078863 -O temp
This is on version 3.1.0 (I also observed it in 2.11.0). SRRs from other studies download without this problem.
It looks to me like
NC_000001.10
is a human reference DNA sequence. I don't really understand why that's needed. I can't find any description online of how these dependencies work or why they're needed - is there an explanation somewhere?