guma44 / GEOparse

Python library to access Gene Expression Omnibus Database (GEO)
BSD 3-Clause "New" or "Revised" License
137 stars 51 forks source link

UnboundLocalError when GSE non-existent #82

Open jonasfreimuth opened 11 months ago

jonasfreimuth commented 11 months ago

Hello,

when trying to download a non-exising (because deleted) GEO Series I get an UnboundLocalError.

Minimal example, that particular GSE is what raised this issue for me. According to its GEO page it has been deleted in 2019:

(This was run using Python 3.10.12 in a venv with just GEOparse 2.0.3 and its dependencies installed.)

import GEOparse as gp
gp.get_GEO("GSE108587")

I get the following output:

26-Sep-2023 17:07:29 DEBUG utils - Directory ./ already exists. Skipping.
26-Sep-2023 17:07:29 INFO GEOparse - Downloading ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE108nnn/GSE108587/soft/GSE108587_family.soft.gz to ./GSE108587_family.soft.gz
26-Sep-2023 17:07:30 ERROR downloader - Error when trying to retreive ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE108nnn/GSE108587/soft/GSE108587_family.soft.gz.
Traceback (most recent call last):
  File "/tmp/tmp.BcsQuqwIqU/.venv/lib/python3.10/site-packages/GEOparse/downloader.py", line 149, in _download_ftp
    total_size = ftp.size(parsed_url.path)
  File "/gnu/store/a547z6gpzimk68vdv8afmjsmgblnal9w-profile/lib/python3.10/ftplib.py", line 630, in size
    resp = self.sendcmd('SIZE ' + filename)
  File "/gnu/store/a547z6gpzimk68vdv8afmjsmgblnal9w-profile/lib/python3.10/ftplib.py", line 281, in sendcmd
    return self.getresp()
  File "/gnu/store/a547z6gpzimk68vdv8afmjsmgblnal9w-profile/lib/python3.10/ftplib.py", line 254, in getresp
    raise error_perm(resp)
ftplib.error_perm: 550 /geo/series/GSE108nnn/GSE108587/soft/GSE108587_family.soft.gz: No such file or directory
Traceback (most recent call last):
  File "/tmp/tmp.BcsQuqwIqU/.venv/lib/python3.10/site-packages/GEOparse/utils.py", line 80, in download_from_url
    fn.download(silent=silent, force=force)
  File "/tmp/tmp.BcsQuqwIqU/.venv/lib/python3.10/site-packages/GEOparse/downloader.py", line 82, in download
    _download()
  File "/tmp/tmp.BcsQuqwIqU/.venv/lib/python3.10/site-packages/GEOparse/downloader.py", line 53, in _download
    self._download_ftp(silent=silent)
  File "/tmp/tmp.BcsQuqwIqU/.venv/lib/python3.10/site-packages/GEOparse/downloader.py", line 187, in _download_ftp
    if total_size != 0:
UnboundLocalError: local variable 'total_size' referenced before assignment

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/tmp/tmp.BcsQuqwIqU/.venv/lib/python3.10/site-packages/GEOparse/GEOparse.py", line 96, in get_GEO
    filepath, geotype = get_GEO_file(
  File "/tmp/tmp.BcsQuqwIqU/.venv/lib/python3.10/site-packages/GEOparse/GEOparse.py", line 263, in get_GEO_file
    utils.download_from_url(url, filepath, silent=silent, aspera=aspera)
  File "/tmp/tmp.BcsQuqwIqU/.venv/lib/python3.10/site-packages/GEOparse/utils.py", line 82, in download_from_url
    raise IOError(
OSError: Download failed due to 'local variable 'total_size' referenced before assignment'. ID could be incorrect or the data might not be public yet.

While this get's "caught" and repackaged with a hint that there's something wrong with the ID, seeing the text "Download failed due to 'local variable 'total_size' referenced before assignment" won't let you think that there was something wrong with the GSE itself.

I just noticed that #77 ought to fix this.

Anyways nice package, thank you very much for creating it! I hope to see it get further improvements in the future!

jonasfreimuth commented 1 week ago

Since I checked in and saw that #77 has been merged, I checked if this is fixed now, but it doesn't quite seem to be the case. Running the above example on the current version produced the following error:

Python 3.10.7 (main, Jan  1 1970, 00:00:01) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import GEOparse as gp
>>> gp.get_GEO("GSE108587")
23-Aug-2024 20:01:43 DEBUG utils - Directory ./ already exists. Skipping.
23-Aug-2024 20:01:43 INFO GEOparse - Downloading ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE108nnn/GSE108587/soft/GSE108587_family.soft.gz to ./GSE108587_family.soft.gz
23-Aug-2024 20:01:44 ERROR downloader - Error when trying to retreive ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE108nnn/GSE108587/soft/GSE108587_family.soft.gz.
Traceback (most recent call last):
  File "/home/jonas/proj/geoparse-temp/.venv/lib/python3.10/site-packages/GEOparse/downloader.py", line 150, in _download_ftp
    ftp_size = ftp.size(parsed_url.path)
  File "/gnu/store/1w5v338qk5m8khcazwclprs3znqp6f7f-python-3.10.7/lib/python3.10/ftplib.py", line 630, in size
    resp = self.sendcmd('SIZE ' + filename)
  File "/gnu/store/1w5v338qk5m8khcazwclprs3znqp6f7f-python-3.10.7/lib/python3.10/ftplib.py", line 281, in sendcmd
    return self.getresp()
  File "/gnu/store/1w5v338qk5m8khcazwclprs3znqp6f7f-python-3.10.7/lib/python3.10/ftplib.py", line 254, in getresp
    raise error_perm(resp)
ftplib.error_perm: 550 /geo/series/GSE108nnn/GSE108587/soft/GSE108587_family.soft.gz: No such file or directory
23-Aug-2024 20:01:44 DEBUG downloader - Moving /tmp/tmpw8ka7_9d to /home/jonas/proj/geoparse-temp/GSE108587_family.soft.gz
Traceback (most recent call last):
  File "/home/jonas/proj/geoparse-temp/.venv/lib/python3.10/site-packages/GEOparse/utils.py", line 80, in download_from_url
    fn.download(silent=silent, force=force)
  File "/home/jonas/proj/geoparse-temp/.venv/lib/python3.10/site-packages/GEOparse/downloader.py", line 82, in download
    _download()
  File "/home/jonas/proj/geoparse-temp/.venv/lib/python3.10/site-packages/GEOparse/downloader.py", line 57, in _download
    shutil.copyfile(self._temp_file_name, self.destination)
  File "/gnu/store/1w5v338qk5m8khcazwclprs3znqp6f7f-python-3.10.7/lib/python3.10/shutil.py", line 254, in copyfile
    with open(src, 'rb') as fsrc:
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpw8ka7_9d'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/jonas/proj/geoparse-temp/.venv/lib/python3.10/site-packages/GEOparse/GEOparse.py", line 96, in get_GEO
    filepath, geotype = get_GEO_file(
  File "/home/jonas/proj/geoparse-temp/.venv/lib/python3.10/site-packages/GEOparse/GEOparse.py", line 263, in get_GEO_file
    utils.download_from_url(url, filepath, silent=silent, aspera=aspera)
  File "/home/jonas/proj/geoparse-temp/.venv/lib/python3.10/site-packages/GEOparse/utils.py", line 82, in download_from_url
    raise IOError(
OSError: Download failed due to '[Errno 2] No such file or directory: '/tmp/tmpw8ka7_9d''. ID could be incorrect or the data might not be public yet.

It seems the Error 550 would need to explicitly be handled somethwere to properly address this. Again, this results in an error in any case, but the error message is still a bit misleading.