Closed EssamWisam closed 1 year ago
I also have this problem. Could anyone help? And also, even without split, just the root, it also has some problems:
Traceback (most recent call last):
File "/home/hu/torchtext/torchtext_test.py", line 20, in <module>
test_dataset = DBpedia(root = './data/dbpedia_csv/')
File "/home/hu/anaconda3/envs/learningFL/lib/python3.9/site-packages/torchtext/datasets/text_classification.py", line 237, in DBpedia
return _setup_datasets(*(("DBpedia",) + args), **kwargs)
File "/home/hu/anaconda3/envs/learningFL/lib/python3.9/site-packages/torchtext/datasets/text_classification.py", line 117, in _setup_datasets
dataset_tar = download_from_url(URLS[dataset_name], root=root)
File "/home/hu/anaconda3/envs/learningFL/lib/python3.9/site-packages/torchtext/utils.py", line 105, in download_from_url
return _process_response(response, root, filename)
File "/home/hu/anaconda3/envs/learningFL/lib/python3.9/site-packages/torchtext/utils.py", line 54, in _process_response
d = r.headers['content-disposition']
File "/home/hu/anaconda3/envs/learningFL/lib/python3.9/site-packages/requests/structures.py", line 52, in __getitem__
return self._store[key.lower()][1]
KeyError: 'content-disposition'
@Nayef211 can you help?
@EssamWisam @marxwolf I just tried running the Colab notebook and unfortunately I couldn't repro the issue you guys are facing.
I might highlight as well that the string doc for AG_NEWS mentions
train_dataset, test_dataset = torchtext.datasets.AG_NEWS(ngrams=3)
When a specific split
isn't provided for a dataset, all splits of the dataset are returned, hence the suggestion from the docstrings of the AG_NEWS dataset. Lmk if that makes sense.
We need to add a note that it has to be run with the GPU runtime enabled.
@svekars I was able to run the tutorial on Google Colab with GPU enabled. However, I wasn't able to run train_iter = iter(AG_NEWS(split='train'))
without the portalocker>=2.0.0
package. After installing it and restarting the notebook, all the code ran successfully.
/assigntome
@svekars: Where the GPU runtime requirement you mentioned is coming from? I tried running it on collab with or without GPU as well as with default and cpu-only torch packages, and it worked fine in all cases as long as portalocker was installed.
The first section of the tutorial suggests
import torch from torchtext.datasets import AG_NEWS train_iter = iter(AG_NEWS(split='train'))
which does not work yielding
TypeError: _setup_datasets() got an unexpected keyword argument 'split'
I might highlight as well that the string doc for AG_NEWS mentions
train_dataset, test_dataset = torchtext.datasets.AG_NEWS(ngrams=3)
cc @pytorch/team-text-core @Nayef211