pytorch / text

Models, data loaders and abstractions for language processing, powered by PyTorch
https://pytorch.org/text
BSD 3-Clause "New" or "Revised" License
3.51k stars 810 forks source link

NotImplementedError Using BucketIterator #1384

Closed yuvrajeyes closed 2 years ago

yuvrajeyes commented 3 years ago
import numpy as np
import spacy
import random
import torch
import torch.nn as nn
import torch.optim as optim
from torchtext.datasets import Multi30k
from torchtext.legacy.data import Field, BucketIterator 
train_data, valid_data, test_data = Multi30k()
batch_size = 64
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
train_iterator = BucketIterator(
    train_data, 
    batch_size=batch_size,
    sort_within_batch=True,
    sort_key=lambda x: len(x.src),
    device=device,
)

When I tried to iterate on train_iterator using

for data in train_iterator:
    print(data)

It gives me NotImplementedError like this,

 NotImplementedError                       Traceback (most recent call last)
<ipython-input-18-7f4717f8ef1c> in <module>
----> 1 for i in train_iterator:
      2     print(i)

~\anaconda3\envs\dl\lib\site-packages\torchtext\legacy\data\iterator.py in __iter__(self)
    143     def __iter__(self):
    144         while True:
--> 145             self.init_epoch()
    146             for idx, minibatch in enumerate(self.batches):
    147                 # fast-forward if loaded from state

~\anaconda3\envs\dl\lib\site-packages\torchtext\legacy\data\iterator.py in init_epoch(self)
    119             self._random_state_this_epoch = self.random_shuffler.random_state
    120 
--> 121         self.create_batches()
    122 
    123         if self._restored_from_state:

~\anaconda3\envs\dl\lib\site-packages\torchtext\legacy\data\iterator.py in create_batches(self)
    250                                  self.batch_size_fn)
    251         else:
--> 252             self.batches = pool(self.data(), self.batch_size,
    253                                 self.sort_key, self.batch_size_fn,
    254                                 random_shuffler=self.random_shuffler,

~\anaconda3\envs\dl\lib\site-packages\torchtext\legacy\data\iterator.py in data(self)
    106             xs = sorted(self.dataset, key=self.sort_key)
    107         elif self.shuffle:
--> 108             xs = [self.dataset[i] for i in self.random_shuffler(range(len(self.dataset)))]
    109         else:
    110             xs = self.dataset

~\anaconda3\envs\dl\lib\site-packages\torchtext\legacy\data\iterator.py in <listcomp>(.0)
    106             xs = sorted(self.dataset, key=self.sort_key)
    107         elif self.shuffle:
--> 108             xs = [self.dataset[i] for i in self.random_shuffler(range(len(self.dataset)))]
    109         else:
    110             xs = self.dataset

~\anaconda3\envs\dl\lib\site-packages\torch\utils\data\dataset.py in __getitem__(self, index)
     32 
     33     def __getitem__(self, index) -> T_co:
---> 34         raise NotImplementedError
     35 
     36     def __add__(self, other: 'Dataset[T_co]') -> 'ConcatDataset[T_co]':

NotImplementedError: 

Please help me to find out whats going wrong in this particular case

Thanks,

ejguan commented 3 years ago

Moving this issue to torchtext as BucketIterator is implemented in torchtext.

hudeven commented 3 years ago

@parmeet could you help to take a look?

parmeet commented 3 years ago

BucketIterator won't work with new datasets. You may need to use legacy datasets.

Please note that legacy code is no longer maintained and will be removed in up-coming releases. You may refer to migration tutorial to help you move from legacy code-base.

RohitPawar2406 commented 2 years ago

BucketIterator won't work with new datasets. You may need to use legacy datasets.

Please note that legacy code is no longer maintained and will be removed in up-coming releases. You may refer to migration tutorial to help you move from legacy code-base. - By Parmeet sir , works for me

hudeven commented 2 years ago

I'm closing it now. Please reopen if it's still a issue

Nayef211 commented 2 years ago

@parmeet @abhinavarora just wondering if we should use legacy or obsolete labels for old issues moving forward?