zalandoresearch / pytorch-ts

PyTorch based Probabilistic Time Series forecasting framework based on GluonTS backend
MIT License
1.24k stars 191 forks source link

RuntimeError: DataLoader worker (pid 34304) is killed by signal: Aborted. #134

Open Hundredl opened 1 year ago

Hundredl commented 1 year ago

I encountered a problem when running the README example. Does anyone know how to solve it? python=3.8 cuda=11.8 gluonts = 0.12.6 by the way, I add training_data *= 100 to solve the problem " Exception: Reached maximum number of idle transformation calls "

100%|██████████| 49/49 [00:07<00:00, 10.35it/s, epoch=1/10, avg_loss=5.04]Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x7f9b0f7e54c0>
Traceback (most recent call last):
  File "/home/wuyy/miniconda3/envs/timegrad/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1479, in __del__
    self._shutdown_workers()
  File "/home/wuyy/miniconda3/envs/timegrad/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1443, in _shutdown_workers
    w.join(timeout=_utils.MP_STATUS_CHECK_INTERVAL)
  File "/home/wuyy/miniconda3/envs/timegrad/lib/python3.8/multiprocessing/process.py", line 149, in join
    res = self._popen.wait(timeout)
  File "/home/wuyy/miniconda3/envs/timegrad/lib/python3.8/multiprocessing/popen_fork.py", line 44, in wait
    if not wait([self.sentinel], timeout):
  File "/home/wuyy/miniconda3/envs/timegrad/lib/python3.8/multiprocessing/connection.py", line 931, in wait
    ready = selector.select(timeout)
  File "/home/wuyy/miniconda3/envs/timegrad/lib/python3.8/selectors.py", line 415, in select
    fd_event_list = self._selector.poll(timeout)
  File "/home/wuyy/miniconda3/envs/timegrad/lib/python3.8/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
    _error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 34304) is killed by signal: Aborted. 
100%|██████████| 49/49 [00:08<00:00,  5.86it/s, epoch=1/10, avg_loss=5.02]
100%|██████████| 49/49 [00:09<00:00,  5.23it/s, epoch=2/10, avg_loss=4.16]
100%|██████████| 49/49 [00:08<00:00, 10.78it/s, epoch=3/10, avg_loss=4.07]Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x7f9b0f7e54c0>
Traceback (most recent call last):
  File "/home/wuyy/miniconda3/envs/timegrad/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1479, in __del__
    self._shutdown_workers()
  File "/home/wuyy/miniconda3/envs/timegrad/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1443, in _shutdown_workers
    w.join(timeout=_utils.MP_STATUS_CHECK_INTERVAL)
  File "/home/wuyy/miniconda3/envs/timegrad/lib/python3.8/multiprocessing/process.py", line 149, in join
    res = self._popen.wait(timeout)
  File "/home/wuyy/miniconda3/envs/timegrad/lib/python3.8/multiprocessing/popen_fork.py", line 44, in wait
    if not wait([self.sentinel], timeout):
  File "/home/wuyy/miniconda3/envs/timegrad/lib/python3.8/multiprocessing/connection.py", line 931, in wait
    ready = selector.select(timeout)
  File "/home/wuyy/miniconda3/envs/timegrad/lib/python3.8/selectors.py", line 415, in select
    fd_event_list = self._selector.poll(timeout)
  File "/home/wuyy/miniconda3/envs/timegrad/lib/python3.8/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
    _error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 36067) is killed by signal: Aborted. 
100%|██████████| 49/49 [00:08<00:00,  5.57it/s, epoch=3/10, avg_loss=4.07]
100%|██████████| 49/49 [00:08<00:00, 11.87it/s, epoch=4/10, avg_loss=4.01]Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x7f9b0f7e54c0>
Traceback (most recent call last):
  File "/home/wuyy/miniconda3/envs/timegrad/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1479, in __del__
    self._shutdown_workers()
  File "/home/wuyy/miniconda3/envs/timegrad/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1443, in _shutdown_workers
    w.join(timeout=_utils.MP_STATUS_CHECK_INTERVAL)
  File "/home/wuyy/miniconda3/envs/timegrad/lib/python3.8/multiprocessing/process.py", line 149, in join
    res = self._popen.wait(timeout)
  File "/home/wuyy/miniconda3/envs/timegrad/lib/python3.8/multiprocessing/popen_fork.py", line 44, in wait
    if not wait([self.sentinel], timeout):
  File "/home/wuyy/miniconda3/envs/timegrad/lib/python3.8/multiprocessing/connection.py", line 931, in wait
    ready = selector.select(timeout)
  File "/home/wuyy/miniconda3/envs/timegrad/lib/python3.8/selectors.py", line 415, in select
    fd_event_list = self._selector.poll(timeout)
  File "/home/wuyy/miniconda3/envs/timegrad/lib/python3.8/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
    _error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 36937) is killed by signal: Aborted. 
100%|██████████| 49/49 [00:08<00:00,  5.53it/s, epoch=4/10, avg_loss=4.01]
100%|██████████| 49/49 [00:09<00:00,  5.19it/s, epoch=5/10, avg_loss=3.99]
100%|██████████| 49/49 [00:09<00:00,  5.23it/s, epoch=6/10, avg_loss=3.97]
100%|██████████| 49/49 [00:08<00:00, 13.04it/s, epoch=7/10, avg_loss=3.96]Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x7f9b0f7e54c0>
Traceback (most recent call last):
  File "/home/wuyy/miniconda3/envs/timegrad/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1479, in __del__
    self._shutdown_workers()
  File "/home/wuyy/miniconda3/envs/timegrad/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1443, in _shutdown_workers
    w.join(timeout=_utils.MP_STATUS_CHECK_INTERVAL)
  File "/home/wuyy/miniconda3/envs/timegrad/lib/python3.8/multiprocessing/process.py", line 149, in join
    res = self._popen.wait(timeout)
  File "/home/wuyy/miniconda3/envs/timegrad/lib/python3.8/multiprocessing/popen_fork.py", line 44, in wait
    if not wait([self.sentinel], timeout):
  File "/home/wuyy/miniconda3/envs/timegrad/lib/python3.8/multiprocessing/connection.py", line 931, in wait
    ready = selector.select(timeout)
  File "/home/wuyy/miniconda3/envs/timegrad/lib/python3.8/selectors.py", line 415, in select
    fd_event_list = self._selector.poll(timeout)
  File "/home/wuyy/miniconda3/envs/timegrad/lib/python3.8/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
    _error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 39662) is killed by signal: Aborted. 
100%|██████████| 49/49 [00:08<00:00,  5.65it/s, epoch=7/10, avg_loss=3.96]
100%|██████████| 49/49 [00:09<00:00,  5.27it/s, epoch=8/10, avg_loss=3.95]
100%|██████████| 49/49 [00:08<00:00, 12.26it/s, epoch=9/10, avg_loss=3.95]Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x7f9b0f7e54c0>
Traceback (most recent call last):
  File "/home/wuyy/miniconda3/envs/timegrad/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1479, in __del__
    self._shutdown_workers()
  File "/home/wuyy/miniconda3/envs/timegrad/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1443, in _shutdown_workers
    w.join(timeout=_utils.MP_STATUS_CHECK_INTERVAL)
  File "/home/wuyy/miniconda3/envs/timegrad/lib/python3.8/multiprocessing/process.py", line 149, in join
    res = self._popen.wait(timeout)
  File "/home/wuyy/miniconda3/envs/timegrad/lib/python3.8/multiprocessing/popen_fork.py", line 44, in wait
    if not wait([self.sentinel], timeout):
  File "/home/wuyy/miniconda3/envs/timegrad/lib/python3.8/multiprocessing/connection.py", line 931, in wait
    ready = selector.select(timeout)
  File "/home/wuyy/miniconda3/envs/timegrad/lib/python3.8/selectors.py", line 415, in select
    fd_event_list = self._selector.poll(timeout)
  File "/home/wuyy/miniconda3/envs/timegrad/lib/python3.8/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
    _error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 41408) is killed by signal: Aborted. 
100%|██████████| 49/49 [00:08<00:00,  5.54it/s, epoch=9/10, avg_loss=3.95]
100%|██████████| 49/49 [00:08<00:00, 10.57it/s, epoch=10/10, avg_loss=3.95]Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x7f9b0f7e54c0>
Traceback (most recent call last):
  File "/home/wuyy/miniconda3/envs/timegrad/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1479, in __del__
    self._shutdown_workers()
  File "/home/wuyy/miniconda3/envs/timegrad/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1443, in _shutdown_workers
    w.join(timeout=_utils.MP_STATUS_CHECK_INTERVAL)
  File "/home/wuyy/miniconda3/envs/timegrad/lib/python3.8/multiprocessing/process.py", line 149, in join
    res = self._popen.wait(timeout)
  File "/home/wuyy/miniconda3/envs/timegrad/lib/python3.8/multiprocessing/popen_fork.py", line 44, in wait
    if not wait([self.sentinel], timeout):
  File "/home/wuyy/miniconda3/envs/timegrad/lib/python3.8/multiprocessing/connection.py", line 931, in wait
    ready = selector.select(timeout)
  File "/home/wuyy/miniconda3/envs/timegrad/lib/python3.8/selectors.py", line 415, in select
    fd_event_list = self._selector.poll(timeout)
  File "/home/wuyy/miniconda3/envs/timegrad/lib/python3.8/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
    _error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 42279) is killed by signal: Aborted. 
100%|██████████| 49/49 [00:08<00:00,  5.62it/s, epoch=10/10, avg_loss=3.94]
zhiyuanzhang15 commented 1 year ago

Hello, do you report any errors when you run the example about Timegrad Estimator? I have encountered this problem, do you know how to solve it?

Exception: Reached maximum number of idle transformation calls.