greenelab / connectivity-search-backend

Django backend for hetnet connectivity search
https://search-api.het.io
BSD 3-Clause "New" or "Revised" License
6 stars 2 forks source link

Zenodo urllib.request.urlretrieve downloads raise ContentTooShortError #77

Closed dhimmel closed 3 years ago

dhimmel commented 3 years ago

When downloading https://zenodo.org/record/1435834/files/dwpcs_length-2_damping-0.0.zip from https://zenodo.org/record/1435834, I got:

python manage.py populate_database --max-metapath-length=3  --reduced-metapaths --batch-size=12000
_download_hetionet_hetmat(self=<dj_hetmech_app.management.commands.populate_database.Command object at 0x7fc5f9b3e670>) ran in 0:00:00
_populate_metanode_table() ran in 0:00:00
_populate_node_table() ran in 0:00:13
_populate_metapath_table() ran in 0:00:00
_download_path_counts(length=1) ran in 0:01:17
_populate_degree_grouped_permutation_table(length=1) ran in 0:00:00
Traceback (most recent call last):
  File "manage.py", line 15, in <module>
    execute_from_command_line(sys.argv)
  File "/home/dhimmel/miniconda3/envs/hetmech-backend/lib/python3.8/site-packages/django/core/management/__init__.py", line 401, in execute_from_command_line
    utility.execute()
  File "/home/dhimmel/miniconda3/envs/hetmech-backend/lib/python3.8/site-packages/django/core/management/__init__.py", line 395, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/home/dhimmel/miniconda3/envs/hetmech-backend/lib/python3.8/site-packages/django/core/management/base.py", line 328, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/home/dhimmel/miniconda3/envs/hetmech-backend/lib/python3.8/site-packages/django/core/management/base.py", line 369, in execute
    output = self.handle(*args, **options)
  File "/home/dhimmel/Documents/repos/connectivity-search-backend/dj_hetmech_app/management/commands/populate_database.py", line 350, in handle
    timed(self._download_path_counts)(length)
  File "/home/dhimmel/Documents/repos/connectivity-search-backend/dj_hetmech_app/utils/__init__.py", line 16, in wrapper
    result = func(*args, **kwargs)
  File "/home/dhimmel/Documents/repos/connectivity-search-backend/dj_hetmech_app/management/commands/populate_database.py", line 274, in _download_path_counts
    path = self.zenodo_download('1435834', archive)
  File "/home/dhimmel/Documents/repos/connectivity-search-backend/dj_hetmech_app/management/commands/populate_database.py", line 365, in zenodo_download
    urlretrieve(url, path)
  File "/home/dhimmel/miniconda3/envs/hetmech-backend/lib/python3.8/urllib/request.py", line 286, in urlretrieve
    raise ContentTooShortError(
urllib.error.ContentTooShortError: <urlopen error retrieval incomplete: got only 1320992799 out of 3186294789 bytes>
Exception ignored in: <function Driver.__del__ at 0x7fc572508160>
Traceback (most recent call last):
  File "/home/dhimmel/miniconda3/envs/hetmech-backend/lib/python3.8/site-packages/neo4j/__init__.py", line 277, in __del__
  File "/home/dhimmel/miniconda3/envs/hetmech-backend/lib/python3.8/site-packages/neo4j/__init__.py", line 307, in close
  File "/home/dhimmel/miniconda3/envs/hetmech-backend/lib/python3.8/site-packages/neo4j/io/__init__.py", line 488, in close
  File "/home/dhimmel/miniconda3/envs/hetmech-backend/lib/python3.8/site-packages/neo4j/io/__init__.py", line 477, in remove
  File "/home/dhimmel/miniconda3/envs/hetmech-backend/lib/python3.8/site-packages/neo4j/io/_bolt3.py", line 390, in close
AttributeError: 'NoneType' object has no attribute 'debug'
dhimmel commented 3 years ago

My guess is that I have a bad internet connection, which caused the download to fail after 1.3 GB of the 3.2 GB file had been downloaded.

So I could retry with better internet, or seeing if requests can handle the poor connection.

dongbohu commented 3 years ago

@dhimmel For a file as big as 3.2 GB, you probably should provide a fingerprint (md5, sha, etc) to verify it after downloading.

dhimmel commented 3 years ago

For a file as big as 3.2 GB, you probably should provide a fingerprint (md5, sha, etc) to verify it after downloading

brilliant! added in https://github.com/greenelab/connectivity-search-backend/pull/79/commits/9ba9e7c0c7278b93c85bc9c9134d6f79bf12287f

dhimmel commented 3 years ago

Closing this since it was likely due to a poor internet connection. The checksum validation will now fail subsequent runs unless the corrupted (partial file) is deleted and redownloaded in its entirety.