As run the seq2seq nn model, meet unsupported operator, "aten::scatter_add.out".

Looong01 commented 2 years ago

as run the seq2seq nn model, meet fatal error:

Could not run 'aten::index_select' with arguments from the 'UNKNOWN_TENSOR_TYPE_ID' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::index_select' is only available for these backends: [CPU, SparseCPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradNestedTensor, UNKNOWN_TENSOR_TYPE_ID, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].

CPU: registered at D:\a_work\1\s\pytorch-directml\build\aten\src\ATen\RegisterCPU.cpp:5926 [kernel] SparseCPU: registered at D:\a_work\1\s\pytorch-directml\build\aten\src\ATen\RegisterSparseCPU.cpp:558 [kernel] BackendSelect: fallthrough registered at D:\a_work\1\s\pytorch-directml\aten\src\ATen\core\BackendSelectFallbackKernel.cpp:3 [backend fallback] Named: registered at D:\a_work\1\s\pytorch-directml\aten\src\ATen\core\NamedRegistrations.cpp:7 [backend fallback] AutogradOther: registered at D:\a_work\1\s\pytorch-directml\torch\csrc\autograd\generated\VariableType_1.cpp:9665 [autograd kernel] AutogradCPU: registered at D:\a_work\1\s\pytorch-directml\torch\csrc\autograd\generated\VariableType_1.cpp:9665 [autograd kernel] AutogradCUDA: registered at D:\a_work\1\s\pytorch-directml\torch\csrc\autograd\generated\VariableType_1.cpp:9665 [autograd kernel] AutogradXLA: registered at D:\a_work\1\s\pytorch-directml\torch\csrc\autograd\generated\VariableType_1.cpp:9665 [autograd kernel] AutogradNestedTensor: registered at D:\a_work\1\s\pytorch-directml\torch\csrc\autograd\generated\VariableType_1.cpp:9665 [autograd kernel] UNKNOWN_TENSOR_TYPE_ID: registered at D:\a_work\1\s\pytorch-directml\torch\csrc\autograd\generated\VariableType_1.cpp:9665 [autograd kernel] AutogradPrivateUse1: registered at D:\a_work\1\s\pytorch-directml\torch\csrc\autograd\generated\VariableType_1.cpp:9665 [autograd kernel] AutogradPrivateUse2: registered at D:\a_work\1\s\pytorch-directml\torch\csrc\autograd\generated\VariableType_1.cpp:9665 [autograd kernel] AutogradPrivateUse3: registered at D:\a_work\1\s\pytorch-directml\torch\csrc\autograd\generated\VariableType_1.cpp:9665 [autograd kernel] Tracer: registered at D:\a_work\1\s\pytorch-directml\torch\csrc\autograd\generated\TraceType_1.cpp:11324 [kernel] Autocast: fallthrough registered at D:\a_work\1\s\pytorch-directml\aten\src\ATen\autocast_mode.cpp:250 [backend fallback] Batched: registered at D:\a_work\1\s\pytorch-directml\aten\src\ATen\BatchingRegistrations.cpp:1016 [backend fallback] VmapMode: fallthrough registered at D:\a_work\1\s\pytorch-directml\aten\src\ATen\VmapModeRegistrations.cpp:33 [backend fallback]

Why your compiled python package has a absolute path???????????

fdwr commented 2 years ago

@Adele101, @smk2007, as the question is about PyTorch rather than DirectML.

Looong01 commented 2 years ago

@Adele101, @smk2007, as the question is about PyTorch rather than DirectML. Do you see this: "D:\a_work\1\s\pytorch-directml\torch\csrc\autograd\generated\VariableType_1.cpp" This abs path is inside the DirectML package that has been compiled. Maybe that's the path of a computer of one of your developer of DirectML.

Adele101 commented 2 years ago

Hi @Looong01, Thanks for the request! We will add aten::index_select to our backlog for operator support and will update this thread when more information is available.

linnealovespie commented 1 year ago

Hi @Looong01, we have support for index_select in the newest version of torch-directml-0.1.13.dev221216. Please take a look and let us know if you have any other issues.

Looong01 commented 1 year ago

Hi @Looong01, we have support for index_select in the newest version of torch-directml-0.1.13.dev221216. Please take a look and let us know if you have any other issues.

When I run it, it returns:

UserWarning: The operator 'aten::scatter_add.out' is not currently supported on the DML backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at D:\a_work\1\s\pytorch-directml-plugin\torch_directml\csrc\dml\dml_cpu_fallback.cpp:16.)

smk2007 commented 1 year ago

Hi @Looong01, sorry for the inconvenience.

We have not enabled the seq2seq model as a whole, and as such some operators may be missing. Ie: scatter_add is not enabled on DirectML yet. We are looking to enable this soon.

Re-opening to make sure this issues is tracked, and we will keep this issue open until the model is unblocked. Thanks!

Looong01 commented 1 year ago

Hi @Looong01, sorry for the inconvenience.

We have not enabled the seq2seq model as a whole, and as such some operators may be missing. Ie: scatter_add is not enabled on DirectML yet. We are looking to enable this soon.

Re-opening to make sure this issues is tracked, and we will keep this issue open until the model is unblocked. Thanks!

Yeah, the training speed of the DML backend version is too slow. 'aten::scatter_add.out' has to fall back to CPU. The training time of my seq2seq model in DML is about hundreds times than its CUDA version.

linnealovespie commented 1 year ago

Hi @Looong01, thanks for bringing this up. We've added support for scatter_add the newest release of torch-directml 0.1.13.1.dev230119.

Looong01 commented 1 year ago

Hi @Looong01, thanks for bringing this up. We've added support for scatter_add the newest release of torch-directml 0.1.13.1.dev230119.

Are you sure about that? Although it does not return the CPU fallback warning now, but it still uses such much CPU resources to train my BERT model. 50% of CPU usage and only 10% of GPU usage.

However, when I use CUDA to train it, it only uses no more than 20% CPU usage and 80% GPU usage.

This is the screenshot:

Looong01 commented 1 year ago

Hi @Looong01, thanks for bringing this up. We've added support for scatter_add the newest release of torch-directml 0.1.13.1.dev230119.

In fact, I tested my codes in both [0.1.13.dev221216] (which do not support aten::scatter_add, then this operation will fall back to CPU) and [0.1.13.1.dev230119] (which you said that about that supporting aten::scatter_add).

And Both testing shows that they all takes the same CPU and GPU usage. Both of them show that the training time of each epoch is the same, about 6s.

microsoft / DirectML

As run the seq2seq nn model, meet unsupported operator, "aten::scatter_add.out". #251