Open dipta007 opened 2 years ago
Hi @dipta007, In torchtext 0.12 we have migrated our datasets on top of torchdata. You can look at datasets implementation that offer plenty of examples or refer the torchdata documentation for additional information on usage and available functionality in datapipes.
In general, datapipes offer constructing iterable Datasets and can be used with large corpus. For instance, unlike Map Style datasets, you do not have to read the whole data into memory to work with Datapipes. They work more like in streaming fashion.
❓ Questions and Help
Description
For a large corpus, I couldn't find any way to use an iterator in the dataset like the PyTorch dataset. Is it possible to make a dataset from only the generator or implement something like a PyTorch dataset object which will dynamically pull the data?