`ParquetDataset` Hangs unexpectedly long when `__getitem__` is called

graphnet-team / graphnet

A Deep learning library for neutrino telescopes

https://graphnet-team.github.io/graphnet/

Apache License 2.0

90 stars 92 forks source link

`ParquetDataset` Hangs unexpectedly long when `getitem` is called #683

Closed RasmusOrsoe closed 5 months ago

RasmusOrsoe commented 6 months ago

Describe the bug When working on #682 I noticed that ParquetDataset hangs unexpectedly long in the __getitem__ function call, when used in a DataLoader - but not outside of it.

The line that hangs is df.explode(columns) which relies on polars that may utilize multiprocessing to execute the line.

The bug appears to be resolved when setting the torch.multiprocessing context to "spawn".

graphnet-team / graphnet

`ParquetDataset` Hangs unexpectedly long when `__getitem__` is called #683

`ParquetDataset` Hangs unexpectedly long when `getitem` is called #683