name not found while resolving tree within virtual file system module - cannot get remote location for 'NC_000001.10'

tgbrooks commented 3 weeks ago

I am getting the following error when running: prefetch SRR2078863 -O temp

2024-11-07T18:52:46 prefetch.3.1.0: Current preference is set to retrieve SRA Normalized Format files with full base quality scores.
2024-11-07T18:52:46 prefetch.3.1.0: 1) Downloading 'SRR2078863'...
2024-11-07T18:52:46 prefetch.3.1.0: SRA Normalized Format file is being retrieved, if this is different from your preference, it may be due to current file availability.
2024-11-07T18:52:46 prefetch.3.1.0:  Downloading via HTTPS...
2024-11-07T18:57:52 prefetch.3.1.0:  HTTPS download succeed
2024-11-07T18:57:52 prefetch.3.1.0: 1.2) Downloading 'SRR2078863.vdbcache'...
2024-11-07T18:57:52 prefetch.3.1.0:  Downloading via HTTPS...
2024-11-07T18:58:30 prefetch.3.1.0:  HTTPS download succeed
2024-11-07T18:58:32 prefetch.3.1.0:  'SRR2078863.vdbcache' is valid
2024-11-07T18:58:32 prefetch.3.1.0: 1.2) 'SRR2078863.vdbcache' was downloaded successfully
2024-11-07T18:58:52 prefetch.3.1.0:  'SRR2078863' is valid
2024-11-07T18:58:52 prefetch.3.1.0: 1) 'SRR2078863' was downloaded successfully
2024-11-07T18:58:52 prefetch.3.1.0: 'SRR2078863' has 25 unresolved dependencies
2024-11-07T18:58:52 prefetch.3.1.0 int: name not found while resolving tree within virtual file system module - cannot get remote location for 'NC_000001.10'

This is on version 3.1.0 (I also observed it in 2.11.0). SRRs from other studies download without this problem.

It looks to me like NC_000001.10 is a human reference DNA sequence. I don't really understand why that's needed. I can't find any description online of how these dependencies work or why they're needed - is there an explanation somewhere?

klymenko commented 3 weeks ago

NC_000001.10 cannot be resolved by SRA data locator service. Please send email to sra@ncbi.nlm.nih.gov.

tgbrooks commented 3 weeks ago

Thanks for that suggestion, I have emailed them. Are you able to explain what's happening a little? Is prefetch trying to download the entire human genome too? If so, will it be doing this for every single sample I try to download? Is it caching it somewhere? I might need to configure that so that it works properly on my cluster, but I see no command line options to do so. I'm also confused by the vague documentation. I see vdb-config has a CACHE tab that lists "location of user-repository" but the documentation suggests that the prefetch -O option overwrites that. That makes it sound more like an output location than a cache location. If I want to download each sample to a different location but all use the same cache location, is that possible? And if I run some in parallel, would the caches just clobber each other?

If you know any specific guidance or documentation on how all this behaves, please point me to it. Thanks.

klymenko commented 3 weeks ago

How many runs do you need to download?

tgbrooks commented 3 weeks ago

Right now, about 20. But I'm trying to understand what it's doing so that I don't do something dumb when I go to download more in the future.

tgbrooks commented 2 weeks ago

For anyone else getting this error: it goes away without the -O option, though then you can't redirect where it outputs. It also downloads a bunch of other files like NC_000001.10 into the SRR directory, apparently one per chromosome, about 1.5gb total size.

klymenko commented 2 weeks ago

What is the output of:

curl -H "X-SRA-Release: 3.1.0" -H "X-VDB-Release: 3.1.0" -X POST -d "acc=NC_000001.10&accept-proto=http,https&capability=allow-cloud-refseq,ZQA:R" https://locate.ncbi.nlm.nih.gov/sdl/2/retrieve

tgbrooks commented 2 weeks ago

{"version": "2","result": [{"bundle": "NC_000001.10","status": 200,"msg": "ok","files": [{"object": "refseq|NC_000001.10","accession": "NC_000001.10","type": "sra","name": "NC_000001.10","size": 56636642,"md5": "4a38084e69f2c266d32f6509c288bccb","modificationDate": "2018-02-22T18:21:00Z","locations": [{"service": "sra-ncbi","region": "public","link": "https://sra-download.ncbi.nlm.nih.gov/traces/refseq/NC_000001.10"}]}]}]}

klymenko commented 2 weeks ago

Please run prefetch SRR2078863 -O test again. Does it succeed to download references?

tgbrooks commented 2 weeks ago

Yes, it worked now. Thank you!

ncbi / sra-tools

name not found while resolving tree within virtual file system module - cannot get remote location for 'NC_000001.10' #984