Open captroper opened 1 year ago
The following error message seems be related to DirectML EP.
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : D:\a_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\ExecutionProvider.cpp(896)\onnxruntime_pybind11_state.pyd!00007FFE31C80201: (caller: 00007FFE31C80C2F) Exception(2) tid(3c14) 887A0006 The GPU will not respond to more commands, most likely because of an invalid command passed by the calling application.
@jstoecker, do you have any insight?
seems similar with https://github.com/microsoft/Olive/issues/510
This is DXGI_ERROR_DEVICE_HUNG
during inference/evaluation, which typically happens when some GPU work is taking excessively long. The recent AMD driver optimizations for stable diffusion / multi-head attention target the RDNA 3 architecture (e.g., the 7000 series, like the Radeon RX 7900 XTX) but not the RDNA 2 (6000 series). Still, we can try to repro this on an RDNA card to see if anything jumps out.
6800xt has same err
Error on my 6900XT as well, on 0.4.0
Same Error occurred in AMD Ryzen 7 7840U w/ Radeon 780M Graphics. I increased the dedicated GPU memory as #510 mentioned, but the error still.
GPU queue dose not disable TDR. https://github.com/microsoft/onnxruntime/issues/20094 User can manually disable this TdrLevel to test again. Check this to set it. https://learn.microsoft.com/en-us/windows-hardware/drivers/display/tdr-registry-keys We have measured that sometimes it have many jobs in a command list no matter convert the file or run the model in lower end GPUs or large model Remember that enlarge your virtual memory to prevent memory not enough. 200GB is better for SDXL
What happened?
This appeared to me to be the same issue as 510 and 301, though I know nothing. I ran the following commands:
I've attached the log, as well as a DXDIAG, but it errors out when optimizing unet saying "failed to run olive on gpu-dml".... "887a0006 the gpu will not respond to more commands".
DxDiag.txt ErrorLog.txt
Version?
0.3.1