Describe the bug
When working on #682 I noticed that ParquetDataset hangs unexpectedly long in the __getitem__ function call, when used in a DataLoader - but not outside of it.
The line that hangs is df.explode(columns) which relies on polars that may utilize multiprocessing to execute the line.
The bug appears to be resolved when setting the torch.multiprocessing context to "spawn".
Describe the bug When working on #682 I noticed that
ParquetDataset
hangs unexpectedly long in the__getitem__
function call, when used in a DataLoader - but not outside of it.The line that hangs is df.explode(columns) which relies on
polars
that may utilize multiprocessing to execute the line.The bug appears to be resolved when setting the torch.multiprocessing context to "
spawn
".