Open Nayef211 opened 2 years ago
Hello ! We are a group of students in second year in engineering school. We are currently interested in resolving this issue as a school project. Please let me know, if we can have your permision to contribute on this issue.
@moDallel - We welcome contributions! Thanks for your interest in the project.
🚀 Feature
We want to add the
LengthSetterIterDataPipe
to the end of all torchtext datasets. This will allow us to calllen()
on the datapipe object and prevent errors likeTypeError: DataPipe instance doesn't have valid length
.Motivation See https://github.com/pytorch/tutorials/pull/1954#discussion_r993951194 for discussion
Additional Context Once this has been done for the
Multi30k
dataset, we can remove the conversion of the datapipe to a list in https://github.com/pytorch/tutorials/pull/1954 (i.e.list(train_dataloader)
) since it would cause all data in the dataset to materialize. This can lead to OOMs for very large datasets.