pytorch / text

Models, data loaders and abstractions for language processing, powered by PyTorch
https://pytorch.org/text
BSD 3-Clause "New" or "Revised" License
3.49k stars 813 forks source link

Torchtext datasets not iterable #2191

Open yashrathi-git opened 1 year ago

yashrathi-git commented 1 year ago

❓ Questions and Help

Description I did this:

>> train_data, val_data, test_data = Multi30k(split=('train', 'valid', 'test'))
>> next(train_data)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[33], line 1
----> 1 next(train_data)

TypeError: 'ShardingFilterIterDataPipe' object is not an iterator

But looking at the docs here, it should be iterable. I also tried using .__iter__.

afurkank commented 1 year ago

You are doing it right. It's just that the datasets are like Schrödinger's cat, you never know if they are going to be alive and working or not when you need them. And this has been the issue for years now.

Edit: I just looked into your code. You are using it wrong.

Here is the correct usage:

next(iter(train_data))

This will create an iterable. Although it still won't work because as I said, something is wrong with the datasets.