saketkc / pysradb

Package for fetching metadata and downloading data from SRA/ENA/GEO
https://saketkc.github.io/pysradb
BSD 3-Clause "New" or "Revised" License
307 stars 50 forks source link

[BUG] Example download from READEME doesn't work #156

Closed xapple closed 2 years ago

xapple commented 2 years ago

Describe the bug I just copy pasted a command from the README that is supposed to download all datasets. It results in a traceback. This library would benefit from more testing.

To Reproduce Steps to reproduce the behavior: pysradb download -y -t 8 --out-dir ./pysradb_downloads -p SRP063852

Desktop (please complete the following information):

Traceback

The supplied url column "None" cannot be found.

Using recommended_url instead.

Checking download URLs
Key error for: https://sra-downloadb.st-va.ncbi.nlm.nih.gov/sos2/sra-pub-run-3/SRR2433794/SRR2433794.1
The following files will be downloaded:

run_accession study_accession experiment_accession recommended_url                                                                         download_url                                                                                          out_dir             filesize
SRR2433794    SRP063852       SRX1254413           https://sra-downloadb.st-va.ncbi.nlm.nih.gov/sos2/sra-pub-run-3/SRR2433794/SRR2433794.1 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR243/SRR2433794/SRR2433794.sra ./pysradb_downloads 0.0

Total size: 0.0

  0%|                                                                                                                       | 0/1 [00:01<?, ?it/s]
Traceback (most recent call last):
  File "/Users/xapple/Library/Python/3.9/lib/python/site-packages/pysradb/download.py", line 152, in download_file
    file_size = int(session.head(url).headers["Content-length"])
  File "/Users/xapple/Library/Python/3.9/lib/python/site-packages/requests/structures.py", line 54, in __getitem__
    return self._store[key.lower()][1]
KeyError: 'content-length'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/xapple/Library/Python/3.9/bin/pysradb", line 8, in <module>
    sys.exit(parse_args())
  File "/Users/xapple/Library/Python/3.9/lib/python/site-packages/pysradb/cli.py", line 1210, in parse_args
    download(
  File "/Users/xapple/Library/Python/3.9/lib/python/site-packages/pysradb/cli.py", line 126, in download
    sradb.download(
  File "/Users/xapple/Library/Python/3.9/lib/python/site-packages/pysradb/sradb.py", line 1542, in download
    thread_map(
  File "/usr/local/lib/python3.9/site-packages/tqdm/contrib/concurrent.py", line 94, in thread_map
    return _executor_map(ThreadPoolExecutor, fn, *iterables, **tqdm_kwargs)
  File "/usr/local/lib/python3.9/site-packages/tqdm/contrib/concurrent.py", line 76, in _executor_map
    return list(tqdm_class(ex.map(fn, *iterables, **map_args), **kwargs))
  File "/usr/local/lib/python3.9/site-packages/tqdm/std.py", line 1195, in __iter__
    for obj in iterable:
  File "/usr/local/Cellar/python@3.9/3.9.10/Frameworks/Python.framework/Versions/3.9/lib/python3.9/concurrent/futures/_base.py", line 609, in result_iterator
    yield fs.pop().result()
  File "/usr/local/Cellar/python@3.9/3.9.10/Frameworks/Python.framework/Versions/3.9/lib/python3.9/concurrent/futures/_base.py", line 446, in result
    return self.__get_result()
  File "/usr/local/Cellar/python@3.9/3.9.10/Frameworks/Python.framework/Versions/3.9/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
    raise self._exception
  File "/usr/local/Cellar/python@3.9/3.9.10/Frameworks/Python.framework/Versions/3.9/lib/python3.9/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/Users/xapple/Library/Python/3.9/lib/python/site-packages/pysradb/sradb.py", line 89, in _handle_download
    download_file(srapath_url, srr_location)
  File "/Users/xapple/Library/Python/3.9/lib/python/site-packages/pysradb/download.py", line 176, in download_file
    if file_size == os.path.getsize(tmp_file_path):
  File "/usr/local/Cellar/python@3.9/3.9.10/Frameworks/Python.framework/Versions/3.9/lib/python3.9/genericpath.py", line 50, in getsize
    return os.stat(filename).st_size
FileNotFoundError: [Errno 2] No such file or directory: './pysradb_downloads/SRP063852/SRX1254413/SRR2433794.sra.part'
saketkc commented 2 years ago

Thanks for the bug report. I agree we should be testing more vigorously, but rather than a bug - this is more because of changes from NCBI side.

None of the URLs work: https://sra-downloadb.st-va.ncbi.nlm.nih.gov/sos2/sra-pub-run-3/SRR2433794/SRR2433794.1 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR243/SRR2433794/SRR2433794.sra

saketkc commented 2 years ago

PRs are always welcome.

pabisk commented 2 years ago

Maybe you could update the readme to reflect that this feature does not work or that this example does not work? Also could you update the tool to throw an error message that is less vague?

I had the same problem being unable to get the readme examples to work and also not just with this SRP, but with almost any other SRP I have tested.

Fails with the same error: pysradb download -p SRP095235 pysradb srx-to-srr SRX5893229

UnboundLocalError: local variable 'exp_platform_model' referenced before assignment

Gets stuck with no error message: pysradb download -x SRX5893229

Other commands work so the install and my connection are okay, e.g.: pysradb download -g GSE131754

python --version
Python 3.8.10
pysradb --version
pysradb 1.3.0

I would not be surprised if the NCBI ruined the pysradb feature on their end, but it would be nice if you indicated this in the readme.

saketkc commented 2 years ago

Yes, I agree. Apologies for the errors. I should revise the readme and check if some of the errors are addressable with code changes.

saketkc commented 2 years ago

These should now be fixed with release 1.4.1