I am observing "could not create an engine" error in executing demo.py example from "*oneCCL Bindings for PyTorch Getting Started Sample**". The code is run on Saphire node with 4 PVCs at TACC system. Any suggestions on identifying the cause and fixing it?
(base) c551-003pvc$ mpirun -n 2 -l python demo.py -dev xpu
[0] Runing Iteration: 0 on device xpu:0
[0] Runing forward: 0 on device xpu:0
[0] Traceback (most recent call last):
[0] File "/scratch/05231/aruhela/demo.py", line 67, in
[1] Runing Iteration: 0 on device xpu:1
[1] Runing forward: 0 on device xpu:1
[1] Traceback (most recent call last):
[1] File "/scratch/05231/aruhela/demo.py", line 67, in
[0] res = model(input)
[0] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
[1] res = model(input)
[1] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
[0] return self._call_impl(*args, kwargs)
[0] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
[1] return self._call_impl(*args, *kwargs)
[1] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
[0] return forward_call(args, kwargs)
[0] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 1519, in forward
[1] return forward_call(*args, kwargs)
[1] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 1519, in forward
[0] else self._run_ddp_forward(*inputs, *kwargs)
[0] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 1355, in _run_ddp_forward
[1] else self._run_ddp_forward(inputs, kwargs)
[1] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 1355, in _run_ddp_forward
[0] return self.module(*inputs, kwargs) # type: ignore[index]
[0] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
[1] return self.module(*inputs, *kwargs) # type: ignore[index]
[1] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
[0] return self._call_impl(args, kwargs)
[0] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
[1] return self._call_impl(*args, kwargs)
[1] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
[0] return forward_call(*args, *kwargs)
[0] File "/scratch/05231/aruhela/demo.py", line 26, in forward
[1] return forward_call(args, kwargs)
[1] File "/scratch/05231/aruhela/demo.py", line 26, in forward
[0] return self.linear(input)
[0] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
[1] return self.linear(input)
[1] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
[0] return self._call_impl(*args, kwargs)
[0] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
[1] return self._call_impl(*args, *kwargs)
[1] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
[0] return forward_call(args, kwargs)
[0] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 114, in forward
[1] return forward_call(*args, **kwargs)
[1] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 114, in forward
[0] return F.linear(input, self.weight, self.bias)
[0] RuntimeError: could not create an engine
[1] return F.linear(input, self.weight, self.bias)
[1] RuntimeError: could not create an engine
(base) c551-003pvc$
Hi Intel Team
I am observing "could not create an engine" error in executing demo.py example from "*oneCCL Bindings for PyTorch Getting Started Sample**". The code is run on Saphire node with 4 PVCs at TACC system. Any suggestions on identifying the cause and fixing it?
(base) c551-003pvc$ mpirun -n 2 -l python demo.py -dev xpu [0] Runing Iteration: 0 on device xpu:0 [0] Runing forward: 0 on device xpu:0 [0] Traceback (most recent call last): [0] File "/scratch/05231/aruhela/demo.py", line 67, in
[1] Runing Iteration: 0 on device xpu:1
[1] Runing forward: 0 on device xpu:1
[1] Traceback (most recent call last):
[1] File "/scratch/05231/aruhela/demo.py", line 67, in
[0] res = model(input)
[0] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
[1] res = model(input)
[1] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
[0] return self._call_impl(*args, kwargs)
[0] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
[1] return self._call_impl(*args, *kwargs)
[1] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
[0] return forward_call(args, kwargs)
[0] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 1519, in forward
[1] return forward_call(*args, kwargs)
[1] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 1519, in forward
[0] else self._run_ddp_forward(*inputs, *kwargs)
[0] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 1355, in _run_ddp_forward
[1] else self._run_ddp_forward(inputs, kwargs)
[1] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 1355, in _run_ddp_forward
[0] return self.module(*inputs, kwargs) # type: ignore[index]
[0] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
[1] return self.module(*inputs, *kwargs) # type: ignore[index]
[1] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
[0] return self._call_impl(args, kwargs)
[0] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
[1] return self._call_impl(*args, kwargs)
[1] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
[0] return forward_call(*args, *kwargs)
[0] File "/scratch/05231/aruhela/demo.py", line 26, in forward
[1] return forward_call(args, kwargs)
[1] File "/scratch/05231/aruhela/demo.py", line 26, in forward
[0] return self.linear(input)
[0] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
[1] return self.linear(input)
[1] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
[0] return self._call_impl(*args, kwargs)
[0] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
[1] return self._call_impl(*args, *kwargs)
[1] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
[0] return forward_call(args, kwargs)
[0] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 114, in forward
[1] return forward_call(*args, **kwargs)
[1] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 114, in forward
[0] return F.linear(input, self.weight, self.bias)
[0] RuntimeError: could not create an engine
[1] return F.linear(input, self.weight, self.bias)
[1] RuntimeError: could not create an engine
(base) c551-003pvc$
Notes: OneAPI release is 2024.2 Install command (AI Selector Tool): conda install -c intel -c conda-forge --override-channels intel/label/oneapi::intel-extension-for-pytorch=2.1.20 intel/label/oneapi::pytorch=2.1.0 intel/label/oneapi::oneccl_bind_pt=2.1.200 intel/label/oneapi::torchvision=0.16.0 intel/label/oneapi::torchaudio=2.1.0 conda-forge::deepspeed=0.14.0 python=3.9
Thanks Amit Ruhela