Open jacktang opened 1 year ago
Hi @jacktang, thank your for writing.
Can you please indicate which cell of the notebook raise the error? It seems like it comes from one of darts dependencies...
Also, can you try upgrading to darts 0.25.0?
OK. I upgraded to 0.25.0, and converted the code to python code. But the error still exists. The OS is Ubuntu 20.04.4 LTS
Best value: 29.46530282497406, Best params: {'kernel_size': 3, 'num_filters': 4, 'weight_norm': False, 'dilation_base': 2, 'dropout': 0.017801282281381472, 'lr': 8.169771024932909e-05, 'year': False}
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
| Name | Type | Params
----------------------------------------------------
0 | criterion | MSELoss | 0
1 | train_metrics | MetricCollection | 0
2 | val_metrics | MetricCollection | 0
3 | dropout | MonteCarloDropout | 0
4 | res_blocks | ModuleList | 166
----------------------------------------------------
166 Trainable params
0 Non-trainable params
166 Total params
0.001 Total estimated model params size (MB)
Epoch 0: 100%|██████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 42.49it/s, train_loss=8.210[I 2023-08-08 17:42:53,616] Trial 16 pruned. Trial was pruned at epoch 0.█████████████████████████| 1/1 [00:00<00:00, 652.30it/s]
Current value: 5.6125102043151855, Current params: {'kernel_size': 3, 'num_filters': 3, 'weight_norm': False, 'dilation_base': 2, 'dropout': 0.10989051943366332, 'lr': 0.0008949513735868809, 'year': False}
Best value: 29.46530282497406, Best params: {'kernel_size': 3, 'num_filters': 4, 'weight_norm': False, 'dilation_base': 2, 'dropout': 0.017801282281381472, 'lr': 8.169771024932909e-05, 'year': False}
Epoch 1: 100%|██████████████████████████████████████████████████| 3/3 [00:00<00:00, 5.10it/s, train_loss=1.000, val_loss=0.859]
Epoch 0: 100%|██████████████████████████████████████████████████████████████████| 3/3 [00:15<00:00, 5.07s/it, train_loss=8.210]
Exception in thread QueueFeederThread:
Traceback (most recent call last):
File "/home/dev/miniconda3/envs/pf/lib/python3.10/multiprocessing/queues.py", line 239, in _feed
reader_close()
File "/home/dev/miniconda3/envs/pf/lib/python3.10/multiprocessing/connection.py", line 177, in close
self._close()
File "/home/dev/miniconda3/envs/pf/lib/python3.10/multiprocessing/connection.py", line 361, in _close
_close(self._handle)
OSError: [Errno 9] Bad file descriptor
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/dev/miniconda3/envs/pf/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/home/dev/miniconda3/envs/pf/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/home/dev/miniconda3/envs/pf/lib/python3.10/multiprocessing/queues.py", line 271, in _feed
queue_sem.release()
ValueError: semaphore or lock released too many times
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
| Name | Type | Params
----------------------------------------------------
0 | criterion | MSELoss | 0
1 | train_metrics | MetricCollection | 0
2 | val_metrics | MetricCollection | 0
3 | dropout | MonteCarloDropout | 0
4 | res_blocks | ModuleList | 68
----------------------------------------------------
68 Trainable params
0 Non-trainable params
68 Total params
0.000 Total estimated model params size (MB)
Epoch 7: 100%|██████████████████████████████████████████████████| 3/3 [01:04<00:00, 21.54s/it, train_loss=0.794, val_loss=1.220]
Exception in thread QueueFeederThread:
Traceback (most recent call last):
File "/home/dev/miniconda3/envs/pf/lib/python3.10/multiprocessing/queues.py", line 239, in _feed
reader_close()
File "/home/dev/miniconda3/envs/pf/lib/python3.10/multiprocessing/connection.py", line 177, in close
self._close()
File "/home/dev/miniconda3/envs/pf/lib/python3.10/multiprocessing/connection.py", line 361, in _close
_close(self._handle)
OSError: [Errno 9] Bad file descriptor
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/dev/miniconda3/envs/pf/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/home/dev/miniconda3/envs/pf/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/home/dev/miniconda3/envs/pf/lib/python3.10/multiprocessing/queues.py", line 271, in _feed
queue_sem.release()
ValueError: semaphore or lock released too many times
Epoch 0: 100%|██████████████████████████████████████████████████████████████████| 3/3 [01:32<00:00, 30.92s/it, train_loss=1.360]
Epoch 7: 100%|██████████████████████████████████████████████████| 3/3 [01:15<00:00, 25.29s/it, train_loss=0.903, val_loss=0.999]
Epoch 0: 100%|██████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 48.06it/s, train_loss=1.370[I 2023-08-08 17:44:09,206] Trial 17 pruned. Trial was pruned at epoch 0.█████████████████████████| 1/1 [00:00<00:00, 638.21it/s]
Current value: 1.354859471321106, Current params: {'kernel_size': 4, 'num_filters': 2, 'weight_norm': False, 'dilation_base': 3, 'dropout': 0.045057036646966524, 'lr': 7.765323102891736e-05, 'year': False}
Best value: 29.46530282497406, Best params: {'kernel_size': 3, 'num_filters': 4, 'weight_norm': False, 'dilation_base': 2, 'dropout': 0.017801282281381472, 'lr': 8.169771024932909e-05, 'year': False}
I'm encountering a similar error but it appears to only occur when debugging in pycharm. No issues when training using cpu.
Describe the bug
I am learning darts and optuna hyperparameter optimization from the guide: https://unit8co.github.io/darts/userguide/hyperparameter_optimization.html#hyperparameter-optimization-with-optuna. I trained the model using GPU and 4 workers, got the error:
To Reproduce The code is from https://unit8co.github.io/darts/userguide/hyperparameter_optimization.html#hyperparameter-optimization-with-optuna
Expected behavior No error
System (please complete the following information):