Error with PyTorch backpropagation

microsoft / DirectML

DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning. DirectML provides GPU acceleration for common machine learning tasks across a broad range of supported hardware and drivers, including all DirectX 12-capable GPUs from vendors such as AMD, Intel, NVIDIA, and Qualcomm.

MIT License

2.24k stars 301 forks source link

Error with PyTorch backpropagation #668

Open alterrion-git opened 1 week ago

alterrion-git commented 1 week ago

I get this when running loss.backward(): _RuntimeError: 0 <= device.index() && device.index() < static_cast(device_readyqueues.size()) INTERNAL ASSERT FAILED at "C:\actions-runner\work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\autograd\engine.cpp":1451, please report a bug to PyTorch.

Is DirectML unusable for PyTorch on windows with AMD GPU, or could there be some error somewhere? I just create a super simple net that works fine on CPU.

RayKMAllen commented 6 days ago

I have the same error, also on an AMD Radeon GPU with Windows 11. Model and data successfully transfer to dml, forward pass is fine, but loss.backward() produces this error.