pytorch / text

Models, data loaders and abstractions for language processing, powered by PyTorch
https://pytorch.org/text
BSD 3-Clause "New" or "Revised" License
3.5k stars 815 forks source link

IMDB missing labels #2113

Closed IljaAvadiev closed 1 year ago

IljaAvadiev commented 1 year ago

🐛 Bug

Description

When I download the IMDB dataset from torchtext I observe only labels with values 1 and no other labels.

To Reproduce The following code can be executed on Google Colab.

!pip install torchdata
from torch.utils.data import DataLoader, Dataset
import torchtext
train_dataset = torchtext.datasets.IMDB(split='train')
dl = DataLoader(train_dataset, batch_size=1000, shuffle=True)
next(iter(dl))[0]

Expected behavior I expect a mixture of 1's and 2's in the labels of the dataset.

Environment

I used GoogleColab to create the bug.

[pip3] numpy==1.22.4
[pip3] torch==1.13.1+cu116
[pip3] torchaudio==0.13.1+cu116
[pip3] torchdata==0.5.1
[pip3] torchsummary==1.5.1
[pip3] torchtext==0.14.1
[pip3] torchvision==0.14.1+cu116

But my local machine creates the same outputs.

Nayef211 commented 1 year ago

Closing since this is a dupe of https://github.com/pytorch/text/issues/2041. Will investigate soon