tristandeleu / pytorch-meta

A collection of extensions and data-loaders for few-shot learning & meta-learning in PyTorch
https://tristandeleu.github.io/pytorch-meta/
MIT License
1.98k stars 256 forks source link

HTTPError: HTTP Error 404: Not Found (For TCGA) #37

Open Ghaiyur opened 4 years ago

Ghaiyur commented 4 years ago

When trying to call the TCGA dataset with :

import torchmeta import requests from torchmeta.datasets.tcga import TCGA torchmeta.datasets.TCGA("data", meta_train=True, meta_val=False, meta_test=False, meta_split=None, min_samples_per_class=5, transform=None, target_transform=None, dataset_transform=None, download=True, chunksize=100, preload=True)

The program fails with:

Downloading ACC_clinicalMatrix.gz...


HTTPError Traceback (most recent call last)

in () 5 meta_test=False, meta_split=None, min_samples_per_class=5, transform=None, 6 target_transform=None, dataset_transform=None, download=True, ----> 7 chunksize=100, preload=True) /usr/local/lib/python3.6/dist-packages/torchmeta/datasets/tcga.py in __init__(self, root, meta_train, meta_val, meta_test, meta_split, min_samples_per_class, transform, target_transform, dataset_transform, download, chunksize, preload) 112 113 if download: --> 114 self.download(chunksize) 115 116 self.preloaded = False /usr/local/lib/python3.6/dist-packages/torchmeta/datasets/tcga.py in download(self, chunksize) 275 print('Downloading `{0}.gz`...'.format(filename)) 276 url = self.clinical_matrix_url.format(cancer) --> 277 urllib.request.urlretrieve(url, rawpath) 278 279 print('Extracting `{0}.gz`...'.format(filename)) /usr/lib/python3.6/urllib/request.py in urlretrieve(url, filename, reporthook, data) 246 url_type, path = splittype(url) 247 --> 248 with contextlib.closing(urlopen(url, data)) as fp: 249 headers = fp.info() 250 /usr/lib/python3.6/urllib/request.py in urlopen(url, data, timeout, cafile, capath, cadefault, context) 221 else: 222 opener = _opener --> 223 return opener.open(url, data, timeout) 224 225 def install_opener(opener): /usr/lib/python3.6/urllib/request.py in open(self, fullurl, data, timeout) 530 for processor in self.process_response.get(protocol, []): 531 meth = getattr(processor, meth_name) --> 532 response = meth(req, response) 533 534 return response /usr/lib/python3.6/urllib/request.py in http_response(self, request, response) 640 if not (200 <= code < 300): 641 response = self.parent.error( --> 642 'http', request, response, code, msg, hdrs) 643 644 return response /usr/lib/python3.6/urllib/request.py in error(self, proto, *args) 568 if http_err: 569 args = (dict, 'default', 'http_error_default') + orig_args --> 570 return self._call_chain(*args) 571 572 # XXX probably also want an abstract factory that knows when it makes /usr/lib/python3.6/urllib/request.py in _call_chain(self, chain, kind, meth_name, *args) 502 for handler in handlers: 503 func = getattr(handler, meth_name) --> 504 result = func(*args) 505 if result is not None: 506 return result /usr/lib/python3.6/urllib/request.py in http_error_default(self, req, fp, code, msg, hdrs) 648 class HTTPDefaultErrorHandler(BaseHandler): 649 def http_error_default(self, req, fp, code, msg, hdrs): --> 650 raise HTTPError(req.full_url, code, msg, hdrs, fp) 651 652 class HTTPRedirectHandler(BaseHandler): HTTPError: HTTP Error 404: Not Found
tristandeleu commented 4 years ago

Thank you for reporting this issue, and sorry for the late reply. It looks like some of these files have moved one the website where we used to pull the data from. I'll update the meta-dataset once I know the new url of these files