Open sehoffmann opened 1 year ago
Similar Error handling mechanism should be introduced to MPRS from https://github.com/pytorch/pytorch/blob/main/torch/utils/data/_utils/signal_handling.py
Pls feel free to open a PR to patch it.
@ejguan I might have a look at this in the next weeks. What would be the appropriate place in torchdata to register these global signal handlers?
A new file under dataloader2
directory and invoke it whenever MPRS
is imported.
🐛 Describe the bug
If either a worker process or the feeder process of the MPRS get killed, the main process will just hang indefinitely and not throw an error. Such a situation might easily arise because the OOM reaper killed a child process and not the main process.
Initially arose from trying to troubleshoot #1169 .
Versions