graykode / matorage

Matorage is tensor(multidimensional matrix) object storage manager for deep learning framework(Pytorch, Tensorflow V2, Keras)
https://matorage.readthedocs.io
Other
73 stars 8 forks source link

dataloader raises pickle error when it is used in a process #20

Closed jinserk closed 4 years ago

jinserk commented 4 years ago

Another bug(?) report here. I'm using PyTorch DDP so used the matorage.torch.Dataset in a forked process (multiprocessing forkserver is set). Of course the dataset was initialized in run() function to avoid unexpected pickle error. However, I still get an error related to the picklization:

Process TrainProcess-1:
Traceback (most recent call last):
  File "/home/jinserk/.pyenv/versions/3.8.5/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/jinserk/kyu/kyumlm/mlmanager/torch/workers.py", line 212, in run
    train_loss_per_target, ave_train_loss = self.train_epoch(epoch)
  File "/home/jinserk/kyu/kyumlm/mlmanager/torch/workers.py", line 282, in train_epoch
    for i, data in enumerate(self.train_loader):
  File "/home/jinserk/.pyenv/versions/kyumlm/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 291, in __iter__
    return _MultiProcessingDataLoaderIter(self)
  File "/home/jinserk/.pyenv/versions/kyumlm/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 737, in __init__
    w.start()
  File "/home/jinserk/.pyenv/versions/3.8.5/lib/python3.8/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/home/jinserk/.pyenv/versions/3.8.5/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/home/jinserk/.pyenv/versions/3.8.5/lib/python3.8/multiprocessing/context.py", line 291, in _Popen
    return Popen(process_obj)
  File "/home/jinserk/.pyenv/versions/3.8.5/lib/python3.8/multiprocessing/popen_forkserver.py", line 35, in __init__
    super().__init__(process_obj)
  File "/home/jinserk/.pyenv/versions/3.8.5/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/home/jinserk/.pyenv/versions/3.8.5/lib/python3.8/multiprocessing/popen_forkserver.py", line 47, in _launch
    reduction.dump(process_obj, buf)
  File "/home/jinserk/.pyenv/versions/3.8.5/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'WeakValueDictionary.__init__.<locals>.remove'

I have no idea what the WeakValueDictionary means. Did you test the use of Dataset and the resultant DataLoader in a multiprocessing environment?

Thank you again!

graykode commented 4 years ago

@jinserk
This bug seems like not related to matorage and seems to be related to pytorch. Can you show me how to initialize torch's DataLoader? Did you use DistributedSampler properly?

jinserk commented 4 years ago

Hmm.. In the above case, I just produced one forked process so DistributedSampler was not set.. Will check it more if the error is not related to matorage. Sorry and thanks for letting me know.

graykode commented 4 years ago

@jinserk Thank you. If it becomes a problem with matorage, we will re-open it again. :)

Also, I would appreciate it if you always leave an issue with even a minor one.