theislab / sfaira

data and model repository for single-cell data
https://sfaira.readthedocs.io
BSD 3-Clause "New" or "Revised" License
134 stars 11 forks source link

404 on dataset: https://cellgeni.cog.sanger.ac.uk/BenKidney_v2.1/Mature_Full_v2.1.h5ad #486

Open TheAustinator opened 2 years ago

TheAustinator commented 2 years ago

Describe the bug Urllib hits 404 on the BenKidney_v2.1 datasets, but works on others. I've injected a try/except into my local code as a patch to skip this dataset. Could be worth having a ignore_error arg for Universe.download, which causes it to just print a warning in the case of a problem. Happy to make a PR for the ignore_error arg if that's helpful.

These are the dataset URLs: https://cellgeni.cog.sanger.ac.uk/BenKidney_v2.1/Fetal_full.h5ad https://cellgeni.cog.sanger.ac.uk/BenKidney_v2.1/Mature_Full_v2.1.h5ad And one HCA: https://data.humancellatlas.org/project-assets/project-matrices/cc95ff89-2e68-4a08-a234-480eca21ce79.homo_sapiens.loom

ds = sfaira.data.Universe(data_path=datadir, meta_path=metadir, cache_path=cachedir)
ds.subset(key="organ", values=["kidney"])
ds.download()

Causes the following error:

---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
/tmp/ipykernel_1487/2839717785.py in <module>
----> 1 ds.download()

/usr/local/lib/python3.8/dist-packages/sfaira/data/dataloaders/base/dataset_group.py in download(self, **kwargs)
    918     def download(self, **kwargs):
    919         for x in self.dataset_groups:
--> 920             x.download(**kwargs)
    921 
    922     def load(

/usr/local/lib/python3.8/dist-packages/sfaira/data/dataloaders/base/dataset_group.py in download(self, **kwargs)
    382     def download(self, **kwargs):
    383         for _, v in self.datasets.items():
--> 384             v.download(**kwargs)
    385 
    386     @property

/usr/local/lib/python3.8/dist-packages/sfaira/data/dataloaders/base/dataset.py in download(self, **kwargs)
    319 
    320                 #try:
--> 321                 if 'Content-Disposition' in urllib.request.urlopen(url).info().keys():
    322                     fn = cgi.parse_header(urllib.request.urlopen(url).info()['Content-Disposition'])[1]["filename"]
    323                 else:

/usr/lib/python3.8/urllib/request.py in urlopen(url, data, timeout, cafile, capath, cadefault, context)
    220     else:
    221         opener = _opener
--> 222     return opener.open(url, data, timeout)
    223 
    224 def install_opener(opener):

/usr/lib/python3.8/urllib/request.py in open(self, fullurl, data, timeout)
    529         for processor in self.process_response.get(protocol, []):
    530             meth = getattr(processor, meth_name)
--> 531             response = meth(req, response)
    532 
    533         return response

/usr/lib/python3.8/urllib/request.py in http_response(self, request, response)
    638         # request was successfully received, understood, and accepted.
    639         if not (200 <= code < 300):
--> 640             response = self.parent.error(
    641                 'http', request, response, code, msg, hdrs)
    642 

/usr/lib/python3.8/urllib/request.py in error(self, proto, *args)
    567         if http_err:
    568             args = (dict, 'default', 'http_error_default') + orig_args
--> 569             return self._call_chain(*args)
    570 
    571 # XXX probably also want an abstract factory that knows when it makes

/usr/lib/python3.8/urllib/request.py in _call_chain(self, chain, kind, meth_name, *args)
    500         for handler in handlers:
    501             func = getattr(handler, meth_name)
--> 502             result = func(*args)
    503             if result is not None:
    504                 return result

/usr/lib/python3.8/urllib/request.py in http_error_default(self, req, fp, code, msg, hdrs)
    647 class HTTPDefaultErrorHandler(BaseHandler):
    648     def http_error_default(self, req, fp, code, msg, hdrs):
--> 649         raise HTTPError(req.full_url, code, msg, hdrs, fp)
    650 
    651 class HTTPRedirectHandler(BaseHandler):

HTTPError: HTTP Error 404: Not Found

System [please complete the following information]:

davidsebfischer commented 2 years ago

Thank you @TheAustinator, looks like they were taken down - I assume they were moved, I will go look for them on google and update the loaders accordingly.

TheAustinator commented 2 years ago

Those wily datasets, always sneaking around. Thanks!