Open jpilaul opened 2 years ago
Same thing occurs when streaming files loaded from disk.
Hi ! Thanks for reporting, could this be related to https://github.com/huggingface/datasets/issues/3950 ?
Currently streaming datasets only works in single process, but we're working on having in work in distributed setups as well :) (EDIT: done)
Hi, thanks for your reply. It seems related :)
+1
Please update datasets
if you're having this issue. What version are you using ?
Describe the bug
Interleaving multiple iterable datasets that use
load_dataset
on streaming mode hangs when passed totorch.utils.data.DataLoader
with multiple workers.Steps to reproduce the bug
Expected results
It should be able to iterate the dataset with multiple workers.
Actual results
Prints with results with
next(iter(multilingual_dataset))
andnum_workers=0
but it prints nothing withnum_workers=4
or any number above 0.Environment info
datasets
version: 2.0.1.dev0pytorch
version: 1.10.0+cu113