I tried to run UniTS_supervised with all default settings just for an initial test, but I got this error below. It seems that Torch is missing something, but I didn't see NCCL mentioned anywhere, I just installed everything in requirements.txt. I tried to install NCCL, but it seems to me that it's Linux only. Do you have an idea how to solve this on Windows 10?
C:\Users\comp\UniTS>bash ./scripts/supervised_learning/UniTS_supervised.sh
NOTE: Redirects are currently not supported in Windows or MacOs.
[W ..\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [comp]:4223 (system error: 10049 - A kÚrt cÝm nem ÚrvÚnyes a hozzß tartozˇ k÷rnyezetben.).
[W ..\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [comp]:4223 (system error: 10049 - A kÚrt cÝm nem ÚrvÚnyes a hozzß tartozˇ k÷rnyezetben.).
C:\Users\comp\AppData\Local\Programs\Python\Python310\lib\site-packages\gluonts\json.py:101: UserWarning: Using json-module for json-handling. Consider installing one of orjson, ujson to speed up serialization and deserialization.
warnings.warn(
[W ..\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [comp]:4223 (system error: 10049 - A kÚrt cÝm nem ÚrvÚnyes a hozzß tartozˇ k÷rnyezetben.).
[W ..\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [comp]:4223 (system error: 10049 - A kÚrt cÝm nem ÚrvÚnyes a hozzß tartozˇ k÷rnyezetben.).
Traceback (most recent call last):
File "C:\Users\comp\UniTS\run.py", line 114, in
init_distributed_mode(args)
File "C:\Users\comp\UniTS\utils\ddp.py", line 31, in init_distributed_mode
dist.init_process_group(
File "C:\Users\comp\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\distributed\distributed_c10d.py", line 907, in init_process_group
default_pg = _new_process_group_helper(
File "C:\Users\comp\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\distributed\distributed_c10d.py", line 1013, in _new_process_group_helper
raise RuntimeError("Distributed package doesn't have NCCL " "built in")
RuntimeError: Distributed package doesn't have NCCL built in
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 85344) of binary: C:\Users\comp\AppData\Local\Programs\Python\Python310\python.exe
Traceback (most recent call last):
File "C:\Users\comp\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\comp\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "C:\Users\comp\AppData\Local\Programs\Python\Python310\Scripts\torchrun.exe__main.py", line 7, in
File "C:\Users\comp\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\distributed\elastic\multiprocessing\errors__init__.py", line 346, in wrapper
return f(*args, **kwargs)
File "C:\Users\comp\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\distributed\run.py", line 794, in main
run(args)
File "C:\Users\comp\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\distributed\run.py", line 785, in run
elastic_launch(
File "C:\Users\comp\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\distributed\launcher\api.py", line 134, in call__
return launch_agent(self._config, self._entrypoint, list(args))
File "C:\Users\comp\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\distributed\launcher\api.py", line 250, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
Hello,
I tried to run UniTS_supervised with all default settings just for an initial test, but I got this error below. It seems that Torch is missing something, but I didn't see NCCL mentioned anywhere, I just installed everything in requirements.txt. I tried to install NCCL, but it seems to me that it's Linux only. Do you have an idea how to solve this on Windows 10?
C:\Users\comp\UniTS>bash ./scripts/supervised_learning/UniTS_supervised.sh NOTE: Redirects are currently not supported in Windows or MacOs. [W ..\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [comp]:4223 (system error: 10049 - A kÚrt cÝm nem ÚrvÚnyes a hozzß tartozˇ k÷rnyezetben.). [W ..\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [comp]:4223 (system error: 10049 - A kÚrt cÝm nem ÚrvÚnyes a hozzß tartozˇ k÷rnyezetben.). C:\Users\comp\AppData\Local\Programs\Python\Python310\lib\site-packages\gluonts\json.py:101: UserWarning: Using
init_distributed_mode(args)
File "C:\Users\comp\UniTS\utils\ddp.py", line 31, in init_distributed_mode
dist.init_process_group(
File "C:\Users\comp\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\distributed\distributed_c10d.py", line 907, in init_process_group
default_pg = _new_process_group_helper(
File "C:\Users\comp\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\distributed\distributed_c10d.py", line 1013, in _new_process_group_helper
raise RuntimeError("Distributed package doesn't have NCCL " "built in")
RuntimeError: Distributed package doesn't have NCCL built in
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 85344) of binary: C:\Users\comp\AppData\Local\Programs\Python\Python310\python.exe
Traceback (most recent call last):
File "C:\Users\comp\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\comp\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "C:\Users\comp\AppData\Local\Programs\Python\Python310\Scripts\torchrun.exe__main.py", line 7, in
File "C:\Users\comp\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\distributed\elastic\multiprocessing\errors__init__.py", line 346, in wrapper
return f(*args, **kwargs)
File "C:\Users\comp\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\distributed\run.py", line 794, in main
run(args)
File "C:\Users\comp\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\distributed\run.py", line 785, in run
elastic_launch(
File "C:\Users\comp\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\distributed\launcher\api.py", line 134, in call__
return launch_agent(self._config, self._entrypoint, list(args))
File "C:\Users\comp\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\distributed\launcher\api.py", line 250, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
json
-module for json-handling. Consider installing one oforjson
,ujson
to speed up serialization and deserialization. warnings.warn( [W ..\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [comp]:4223 (system error: 10049 - A kÚrt cÝm nem ÚrvÚnyes a hozzß tartozˇ k÷rnyezetben.). [W ..\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [comp]:4223 (system error: 10049 - A kÚrt cÝm nem ÚrvÚnyes a hozzß tartozˇ k÷rnyezetben.). Traceback (most recent call last): File "C:\Users\comp\UniTS\run.py", line 114, inrun.py FAILED
Failures: