jdb78 / pytorch-forecasting

Time series forecasting with PyTorch
https://pytorch-forecasting.readthedocs.io/
MIT License
3.87k stars 611 forks source link

[MNT] Windows issue on python 3.10-3.12 #1632

Open fkiraly opened 2 weeks ago

fkiraly commented 2 weeks ago

There is another issue on python 3.10-3.12, tests crash with this:

Windows fatal exception: code 0xc0000374

Thread 0x000002c4 (most recent call first):
  File "C:\hostedtoolcache\windows\Python\3.12.5\x64\Lib\threading.py", line 359 in wait
  File "C:\hostedtoolcache\windows\Python\3.12.5\x64\Lib\threading.py", line 655 in wait
  File "C:\hostedtoolcache\windows\Python\3.12.5\x64\Lib\site-packages\tqdm\_monitor.py", line 60 in run
  File "C:\hostedtoolcache\windows\Python\3.12.5\x64\Lib\threading.py", line 1075 in _bootstrap_inner
  File "C:\hostedtoolcache\windows\Python\3.12.5\x64\Lib\threading.py", line 1032 in _bootstrap

Thread 0x000008c8 (most recent call first):
  File "C:\hostedtoolcache\windows\Python\3.12.5\x64\Lib\threading.py", line 359 in wait
  File "C:\hostedtoolcache\windows\Python\3.12.5\x64\Lib\threading.py", line 655 in wait
  File "C:\hostedtoolcache\windows\Python\3.12.5\x64\Lib\site-packages\tqdm\_monitor.py", line 60 in run
  File "C:\hostedtoolcache\windows\Python\3.12.5\x64\Lib\threading.py", line 1075 in _bootstrap_inner
  File "C:\hostedtoolcache\windows\Python\3.12.5\x64\Lib\threading.py", line 1032 in _bootstrap

Current thread 0x00000c88 (most recent call first):
  File "C:\hostedtoolcache\windows\Python\3.12.5\x64\Lib\site-packages\torch\autograd\graph.py", line 768 in _engine_run_backward
  File "C:\hostedtoolcache\windows\Python\3.12.5\x64\Lib\site-packages\torch\autograd\__init__.py", line 289 in backward
  File "C:\hostedtoolcache\windows\Python\3.12.5\x64\Lib\site-packages\torch\_tensor.py", line 521 in backward
  File "C:\hostedtoolcache\windows\Python\3.12.5\x64\Lib\site-packages\lightning\pytorch\core\module.py", line 1101 in backward
  File "C:\hostedtoolcache\windows\Python\3.12.5\x64\Lib\site-packages\lightning\pytorch\plugins\precision\precision.py", line 72 in backward
  File "C:\hostedtoolcache\windows\Python\3.12.5\x64\Lib\site-packages\lightning\pytorch\strategies\strategy.py", line 212 in backward
  File "C:\hostedtoolcache\windows\Python\3.12.5\x64\Lib\site-packages\_pytest\runner.py", line 132 in runtestprotocol
  File "C:\hostedtoolcache\windows\Python\3.12.5\x64\Lib\site-packages\_pytest\runner.py", line 113 in pytest_runtest_protocol
  File "C:\hostedtoolcache\windows\Python\3.12.5\x64\Lib\site-packages\pluggy\_callers.py", line 103 in _multicall
  File "C:\hostedtoolcache\windows\Python\3.12.5\x64\Lib\site-packages\pluggy\_manager.py", line 120 in _hookexec
  File "C:\hostedtoolcache\windows\Python\3.12.5\x64\Lib\site-packages\pluggy\_hooks.py", line 513 in __call__
  File "C:\hostedtoolcache\windows\Python\3.12.5\x64\Lib\site-packages\_pytest\main.py", line 362 in pytest_runtestloop
  File "C:\hostedtoolcache\windows\Python\3.12.5\x64\Lib\site-packages\pluggy\_callers.py", line 103 in _multicall
  File "C:\hostedtoolcache\windows\Python\3.12.5\x64\Lib\site-packages\pluggy\_manager.py", line 120 in _hookexec
  File "C:\hostedtoolcache\windows\Python\3.12.5\x64\Lib\site-packages\pluggy\_hooks.py", line 513 in __call__
  File "C:\hostedtoolcache\windows\Python\3.12.5\x64\Lib\site-packages\_pytest\main.py", line 337 in _main
  File "C:\hostedtoolcache\windows\Python\3.12.5\x64\Lib\site-packages\_pytest\main.py", line 283 in wrap_session
  File "C:\hostedtoolcache\windows\Python\3.12.5\x64\Lib\site-packages\_pytest\main.py", line 330 in pytest_cmdline_main
  File "C:\hostedtoolcache\windows\Python\3.12.5\x64\Lib\site-packages\pluggy\_callers.py", line 103 in _multicall
  File "C:\hostedtoolcache\windows\Python\3.12.5\x64\Lib\site-packages\pluggy\_manager.py", line 120 in _hookexec
  File "C:\hostedtoolcache\windows\Python\3.12.5\x64\Lib\site-packages\pluggy\_hooks.py", line 513 in __call__
  File "C:\hostedtoolcache\windows\Python\3.12.5\x64\Lib\site-packages\_pytest\config\__init__.py", line 175 in main
  File "C:\hostedtoolcache\windows\Python\3.12.5\x64\Lib\site-packages\_pytest\config\__init__.py", line 201 in console_main
  File "C:\hostedtoolcache\windows\Python\3.12.5\x64\Lib\site-packages\pytest\__main__.py", line 9 in <module>
  File "<frozen runpy>", line 88 in _run_code
  File "<frozen runpy>", line 198 in _run_module_as_main
tests/test_models/test_temporal_fusion_transformer.py::test_hyperparameter_optimization_integration[True] 
benHeid commented 2 weeks ago

This is also occurring in sktime with TFT https://github.com/sktime/sktime/actions/runs/10610051364/job/29406877935#step:11:32632

benHeid commented 2 weeks ago

According to different threads at GitHub. This is an error that multiple libraries are facing when using xdist.. See for example: https://github.com/coiled/benchmarks/issues/404. There someone had the assumption that there is to much load on the worker and thus they are failing...

benHeid commented 2 weeks ago

Some people seem to be successful with reducing the number of workers:

benHeid commented 2 weeks ago

I would suggest to select smaller possible values in optimize_hyperparameters then the default ones. Hopefully, this is reducing the load so that these weird errors do not occur any more.. :/

fkiraly commented 2 weeks ago

We could also try turning off pytest-xdist? At least for diagnostic purposes?