I am getting the below error when I increase the --num_threads>0
I have 48GB GPU, when I pass --num_threads=0 then everything works just that dataloader is slow even though I have more GPU memory. Since data is being supplied slowly.
Error
ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).
ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).
Traceback (most recent call last):
File "/X/X/miniconda3/envs/pytorch-CycleGAN-and-pix2pix/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 761, in _try_get_data
data = self._data_queue.get(timeout=timeout)
File "/X/X/miniconda3/envs/pytorch-CycleGAN-and-pix2pix/lib/python3.6/multiprocessing/queues.py", line 104, in get
if not self._poll(timeout):
File "/X/X/miniconda3/envs/pytorch-CycleGAN-and-pix2pix/lib/python3.6/multiprocessing/connection.py", line 257, in poll
return self._poll(timeout)
File "/X/X/miniconda3/envs/pytorch-CycleGAN-and-pix2pix/lib/python3.6/multiprocessing/connection.py", line 414, in _poll
r = wait([self], timeout)
File "/X/X/miniconda3/envs/pytorch-CycleGAN-and-pix2pix/lib/python3.6/multiprocessing/connection.py", line 911, in wait
ready = selector.select(timeout)
File "/X/X/miniconda3/envs/pytorch-CycleGAN-and-pix2pix/lib/python3.6/selectors.py", line 376, in select
fd_event_list = self._poll.poll(timeout)
File "/X/X/miniconda3/envs/pytorch-CycleGAN-and-pix2pix/lib/python3.6/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
_error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 1306065) is killed by signal: Bus error. It is possible that dataloader's workers are out of shared memory. Please try to raise your shared memory limit.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "train.py", line 44, in <module>
for i, data in enumerate(dataset): # inner loop within one epoch
File "/X/X/X/git/pytorch-CycleGAN-and-pix2pix/data/__init__.py", line 90, in __iter__
for i, data in enumerate(self.dataloader):
File "/X/X/miniconda3/envs/pytorch-CycleGAN-and-pix2pix/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 345, in __next__
data = self._next_data()
File "/X/X/miniconda3/envs/pytorch-CycleGAN-and-pix2pix/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 841, in _next_data
idx, data = self._get_data()
File "/X/X/miniconda3/envs/pytorch-CycleGAN-and-pix2pix/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 808, in _get_data
success, data = self._try_get_data()
File "/X/X/miniconda3/envs/pytorch-CycleGAN-and-pix2pix/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 774, in _try_get_data
raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str))
RuntimeError: DataLoader worker (pid(s) 1306065) exited unexpectedly
ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).
ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).
ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).
I am getting the below error when I increase the
--num_threads>0
I have 48GB GPU, when I pass
--num_threads=0
then everything works just thatdataloader
is slow even though I have more GPU memory. Since data is being supplied slowly.Error