intel / torch-xpu-ops

Apache License 2.0
19 stars 14 forks source link

DLRM NotImplementedError: Could not run 'aten::_indices' with arguments from the 'SparseXPU' backend. #484

Open mengfei25 opened 2 months ago

mengfei25 commented 2 months ago

🐛 Describe the bug

torchbench_amp_fp16_training xpu train dlrm Traceback (most recent call last): File "/home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/benchmarks/dynamo/common.py", line 2294, in validate_model self.model_iter_fn(model, example_inputs) File "/home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/benchmarks/dynamo/torchbench.py", line 458, in forward_and_backward_pass self.grad_scaler.scale(loss).backward() File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/_tensor.py", line 522, in backward torch.autograd.backward( File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/autograd/init.py", line 288, in backward _engine_run_backward( File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/autograd/graph.py", line 768, in _engine_run_backward return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass NotImplementedError: Could not run 'aten::_indices' with arguments from the 'SparseXPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::_indices' is only available for these backends: [XPU, Meta, SparseCPU, SparseMeta, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradHIP, AutogradXLA, AutogradMPS, AutogradIPU, AutogradXPU, AutogradHPU, AutogradVE, AutogradLazy, AutogradMTIA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, AutogradMeta, AutogradNestedTensor, Tracer, AutocastCPU, AutocastXPU, AutocastCUDA, FuncTorchBatched, BatchedNestedTensor, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].

XPU: registered at /home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/build/aten/src/ATen/RegisterXPU.cpp:3613 [backend fallback] Meta: registered at /home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/aten/src/ATen/core/MetaFallbackKernel.cpp:23 [backend fallback] SparseCPU: registered at /home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/build/aten/src/ATen/RegisterSparseCPU.cpp:1390 [kernel] SparseMeta: registered at /home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/build/aten/src/ATen/RegisterSparseMeta.cpp:290 [kernel] BackendSelect: fallthrough registered at /home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback] Python: registered at /home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:153 [backend fallback] FuncTorchDynamicLayerBackMode: registered at /home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/aten/src/ATen/functorch/DynamicLayer.cpp:497 [backend fallback] Functionalize: registered at /home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/build/aten/src/ATen/RegisterFunctionalization_2.cpp:22994 [kernel] Named: registered at /home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback] Conjugate: registered at /home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/aten/src/ATen/ConjugateFallback.cpp:17 [backend fallback] Negative: registered at /home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/aten/src/ATen/native/NegateFallback.cpp:18 [backend fallback] ZeroTensor: registered at /home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/aten/src/ATen/ZeroTensorFallback.cpp:86 [backend fallback] ADInplaceOrView: registered at /home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/torch/csrc/autograd/generated/ADInplaceOrViewType_0.cpp:4942 [kernel] AutogradOther: registered at /home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/torch/csrc/autograd/generated/VariableType_1.cpp:16862 [autograd kernel] AutogradCPU: registered at /home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/torch/csrc/autograd/generated/VariableType_1.cpp:16862 [autograd kernel] AutogradCUDA: registered at /home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/torch/csrc/autograd/generated/VariableType_1.cpp:16862 [autograd kernel] AutogradHIP: registered at /home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/torch/csrc/autograd/generated/VariableType_1.cpp:16862 [autograd kernel] AutogradXLA: registered at /home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/torch/csrc/autograd/generated/VariableType_1.cpp:16862 [autograd kernel] AutogradMPS: registered at /home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/torch/csrc/autograd/generated/VariableType_1.cpp:16862 [autograd kernel] AutogradIPU: registered at /home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/torch/csrc/autograd/generated/VariableType_1.cpp:16862 [autograd kernel] AutogradXPU: registered at /home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/torch/csrc/autograd/generated/VariableType_1.cpp:16862 [autograd kernel] AutogradHPU: registered at /home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/torch/csrc/autograd/generated/VariableType_1.cpp:16862 [autograd kernel] AutogradVE: registered at /home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/torch/csrc/autograd/generated/VariableType_1.cpp:16862 [autograd kernel] AutogradLazy: registered at /home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/torch/csrc/autograd/generated/VariableType_1.cpp:16862 [autograd kernel] AutogradMTIA: registered at /home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/torch/csrc/autograd/generated/VariableType_1.cpp:16862 [autograd kernel] AutogradPrivateUse1: registered at /home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/torch/csrc/autograd/generated/VariableType_1.cpp:16862 [autograd kernel] AutogradPrivateUse2: registered at /home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/torch/csrc/autograd/generated/VariableType_1.cpp:16862 [autograd kernel] AutogradPrivateUse3: registered at /home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/torch/csrc/autograd/generated/VariableType_1.cpp:16862 [autograd kernel] AutogradMeta: registered at /home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/torch/csrc/autograd/generated/VariableType_1.cpp:16862 [autograd kernel] AutogradNestedTensor: registered at /home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/torch/csrc/autograd/generated/VariableType_1.cpp:16862 [autograd kernel] Tracer: registered at /home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/torch/csrc/autograd/generated/TraceType_1.cpp:16060 [kernel] AutocastCPU: fallthrough registered at /home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/aten/src/ATen/autocast_mode.cpp:209 [backend fallback] AutocastXPU: fallthrough registered at /home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/aten/src/ATen/autocast_mode.cpp:351 [backend fallback] AutocastCUDA: fallthrough registered at /home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/aten/src/ATen/autocast_mode.cpp:165 [backend fallback] FuncTorchBatched: registered at /home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/aten/src/ATen/functorch/LegacyBatchingRegistrations.cpp:731 [backend fallback] BatchedNestedTensor: registered at /home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/aten/src/ATen/functorch/LegacyBatchingRegistrations.cpp:758 [backend fallback] FuncTorchVmapMode: fallthrough registered at /home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/aten/src/ATen/functorch/VmapModeRegistrations.cpp:27 [backend fallback] Batched: registered at /home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/aten/src/ATen/LegacyBatchingRegistrations.cpp:1075 [backend fallback] VmapMode: fallthrough registered at /home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback] FuncTorchGradWrapper: registered at /home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/aten/src/ATen/functorch/TensorWrapper.cpp:207 [backend fallback] PythonTLSSnapshot: registered at /home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:161 [backend fallback] FuncTorchDynamicLayerFrontMode: registered at /home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/aten/src/ATen/functorch/DynamicLayer.cpp:493 [backend fallback] PreDispatch: registered at /home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:165 [backend fallback] PythonDispatcher: registered at /home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:157 [backend fallback]

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/benchmarks/dynamo/common.py", line 4177, in run ) = runner.load_model( File "/home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/benchmarks/dynamo/torchbench.py", line 380, in load_model self.validate_model(model, example_inputs) File "/home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/benchmarks/dynamo/common.py", line 2296, in validate_model raise RuntimeError("Eager run failed") from e RuntimeError: Eager run failed

eager_fail_to_run

Versions

torch-xpu-ops: https://github.com/intel/torch-xpu-ops/commit/31c400195d63064940242220dc9100322d36bac4 pytorch: 0f81473d7b4a1bf09246410712df22541be7caf3 + PRs: 127277,129120 device: PVC 1100, 803.61, 0.5.1

retonym commented 1 month ago

SparseXPU backend is not supported yet.