pytorch / tutorials

PyTorch tutorials.
https://pytorch.org/tutorials/
BSD 3-Clause "New" or "Revised" License
8.14k stars 4.04k forks source link

Problem with the torchtext library text classification example #1993

Closed EssamWisam closed 1 year ago

EssamWisam commented 2 years ago

The first section of the tutorial suggests import torch from torchtext.datasets import AG_NEWS train_iter = iter(AG_NEWS(split='train'))

which does not work yielding TypeError: _setup_datasets() got an unexpected keyword argument 'split'

I might highlight as well that the string doc for AG_NEWS mentions train_dataset, test_dataset = torchtext.datasets.AG_NEWS(ngrams=3)

cc @pytorch/team-text-core @Nayef211

marxwolf commented 2 years ago

I also have this problem. Could anyone help? And also, even without split, just the root, it also has some problems:

Traceback (most recent call last):
  File "/home/hu/torchtext/torchtext_test.py", line 20, in <module>
    test_dataset = DBpedia(root = './data/dbpedia_csv/')
  File "/home/hu/anaconda3/envs/learningFL/lib/python3.9/site-packages/torchtext/datasets/text_classification.py", line 237, in DBpedia
    return _setup_datasets(*(("DBpedia",) + args), **kwargs)
  File "/home/hu/anaconda3/envs/learningFL/lib/python3.9/site-packages/torchtext/datasets/text_classification.py", line 117, in _setup_datasets
    dataset_tar = download_from_url(URLS[dataset_name], root=root)
  File "/home/hu/anaconda3/envs/learningFL/lib/python3.9/site-packages/torchtext/utils.py", line 105, in download_from_url
    return _process_response(response, root, filename)
  File "/home/hu/anaconda3/envs/learningFL/lib/python3.9/site-packages/torchtext/utils.py", line 54, in _process_response
    d = r.headers['content-disposition']
  File "/home/hu/anaconda3/envs/learningFL/lib/python3.9/site-packages/requests/structures.py", line 52, in __getitem__
    return self._store[key.lower()][1]
KeyError: 'content-disposition'
svekars commented 1 year ago

@Nayef211 can you help?

Nayef211 commented 1 year ago

@EssamWisam @marxwolf I just tried running the Colab notebook and unfortunately I couldn't repro the issue you guys are facing.

I might highlight as well that the string doc for AG_NEWS mentions train_dataset, test_dataset = torchtext.datasets.AG_NEWS(ngrams=3)

When a specific split isn't provided for a dataset, all splits of the dataset are returned, hence the suggestion from the docstrings of the AG_NEWS dataset. Lmk if that makes sense.

svekars commented 1 year ago

We need to add a note that it has to be run with the GPU runtime enabled.

QasimKhan5x commented 1 year ago

@svekars I was able to run the tutorial on Google Colab with GPU enabled. However, I wasn't able to run train_iter = iter(AG_NEWS(split='train')) without the portalocker>=2.0.0 package. After installing it and restarting the notebook, all the code ran successfully.

noqqaqq commented 1 year ago

/assigntome

noqqaqq commented 1 year ago

@svekars: Where the GPU runtime requirement you mentioned is coming from? I tried running it on collab with or without GPU as well as with default and cpu-only torch packages, and it worked fine in all cases as long as portalocker was installed.