junyanz / pytorch-CycleGAN-and-pix2pix

Image-to-Image Translation in PyTorch
Other
23.19k stars 6.33k forks source link

Error in `python': double free or corruption | During training #1328

Open Mollylulu opened 3 years ago

Mollylulu commented 3 years ago
*** Error in `python': double free or corruption (!prev): 0x000055ce575d91f0 ***
*** Error in `python': double free or corruption (!prev): 0x000055ce575d91f0 ***
======= Backtrace: =========
Traceback (most recent call last):
  File "train.py", line 44, in <module>
    for i, data in enumerate(dataset):  # inner loop within one epoch
  File "/data/home/user/explore_USL/git/pytorch-CycleGAN-and-pix2pix/data/__init__.py", line 90, in __iter__
    for i, data in enumerate(self.dataloader):
  File "/home/user/miniconda3/envs/pytorch-CycleGAN-and-pix2pix/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 345, in __next__
    data = self._next_data()
  File "/home/user/miniconda3/envs/pytorch-CycleGAN-and-pix2pix/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 830, in _next_data
    self._shutdown_workers()
  File "/home/user/miniconda3/envs/pytorch-CycleGAN-and-pix2pix/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 941, in _shutdown_workers
    w.join()
  File "/home/user/miniconda3/envs/pytorch-CycleGAN-and-pix2pix/lib/python3.6/multiprocessing/process.py", line 124, in join
    res = self._popen.wait(timeout)
  File "/home/user/miniconda3/envs/pytorch-CycleGAN-and-pix2pix/lib/python3.6/multiprocessing/popen_fork.py", line 50, in wait
    return self.poll(os.WNOHANG if timeout == 0.0 else 0)
  File "/home/user/miniconda3/envs/pytorch-CycleGAN-and-pix2pix/lib/python3.6/multiprocessing/popen_fork.py", line 28, in poll
    pid, sts = os.waitpid(self.pid, flag)
  File "/home/user/miniconda3/envs/pytorch-CycleGAN-and-pix2pix/lib/python3.6/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
    _error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 107907) is killed by signal: Aborted.
*** Error in `python': free(): invalid next size (normal): 0x000055ce575d96f0 ***

I modified the max-dataset-size to the length of my dataset, but still met this issue. Could anyone help? thanks!

AlexWilkinsonnn commented 3 years ago

uh oh, setting num_threads to 0 will avoid this problem but I don't know a fix.

junyanz commented 3 years ago

Not sure how to fix it. There is an earlier post regarding the same issue.

Mollylulu commented 3 years ago

it seems the web socket caused this issue. Currently, I mute the web logging func and related code. This can be bypassed in a not elegant way 🤕