minerllabs / minerl

MineRL Competition for Sample Efficient Reinforcement Learning - Python Package
http://minerl.io/docs/
Other
711 stars 153 forks source link

minerl.data.make crashes when when Torch is loaded, works if it isn't #629

Open lugi7 opened 2 years ago

lugi7 commented 2 years ago

I just encountered a very weird behaviour related to minerl.data.make. What I'm trying to do is to wrap it within PyTorch's IterableDataset class. The script look as follows:

import minerl
import torch

class DemonstrationDataset:
    def __init__(self, env_name):
        self.data = minerl.data.make(env_name)

if __name__ == "__main__":
    minerl.data.make('MineRLTreechop-v0')
    demo_dataset = DemonstrationDataset('MineRLTreechop-v0')

However the problem is that I'm seeing something I cannot explain at all:

  1. minerl.data.make works every time when it's outside of the class
  2. self.data = minerl.data.make(env_name) works inside of the class, but only when Torch is not imported
  3. if Torch is imported and I try to run self.data = minerl.data.make(env_name) I get a following error:
    Exception ignored in: <function Pool.__del__ at 0x0000025D6CFD5F30>
    Traceback (most recent call last):
    File "C:\Users\mateu\AppData\Local\Programs\Python\Python310\lib\multiprocessing\pool.py", line 268, in __del__
    File "C:\Users\mateu\AppData\Local\Programs\Python\Python310\lib\multiprocessing\queues.py", line 372, in put
    AttributeError: 'NoneType' object has no attribute 'dumps'

Could anyone explain to me what is going on here?

Miffyli commented 2 years ago

Hey. This probably has something to do with the fact that PyTorch has its own versions of multiprocessing library, see here: https://pytorch.org/docs/stable/notes/multiprocessing.html . The underlying data loader for MineRL data spawns multiple processes and transfers data between them, which might get confused when PyTorch stuff is used (also note that if PyTorch dataloader spawns multiple copies of the loader, things get pretty odd when you have two levels of code spawning more copies)

lugi7 commented 2 years ago

But how come that in Jupyter Notebook it works without any problem, and when I try to run it as standalone script it crashes?

This error also happens when I import Torch after I construct minerl.data.make('MineRLTreechop-v0')

import minerl

test = minerl.data.make('MineRLTreechop-v0')

import torch
Exception ignored in: <function Pool.__del__ at 0x0000023089F224D0>
Traceback (most recent call last):
  File "C:\Users\mateu\AppData\Local\Programs\Python\Python310\lib\multiprocessing\pool.py", line 268, in __del__
  File "C:\Users\mateu\AppData\Local\Programs\Python\Python310\lib\multiprocessing\queues.py", line 372, in put
AttributeError: 'NoneType' object has no attribute 'dumps'
Miffyli commented 2 years ago

Exact answers would require more in-depth debugging, but I guess that Jupyter's way of spawning processes for kernels (where Python runs) somehow mitigates this problem. As for the second problem, I guess PyTorch does interfere with the multiprocessing library in some way to break this.

Sorry for the lack of detail, I am have not run into same issues ^^. I recommend you read through the link I shared and follow PyTorch's suggestions (e.g. use from torch import multiprocessing instead of import multiprocessing).