microsoft / DirectX-Graphics-Samples

This repo contains the DirectX Graphics samples that demonstrate how to build graphics intensive applications on Windows.
MIT License
5.98k stars 2.02k forks source link

[Question] Sample for Device Removed Extended Data (DRED) api usage on c++ #679

Open kirrero opened 3 years ago

kirrero commented 3 years ago

Hi, Is it possible to add some sample for D3D12 Device Removed Extended Data usage on c++. I guess it could be very usefull to show on real example how to debug TDR's and other GPU crashes with it.

Thanks

CodedByATool commented 3 years ago

I'd like to second this request. Much of the documentation about DRED shows "// write out breadcrumbs data here".

Currently I am translating the WinDbg extension to interpret the data: (especially for 1.2) https://github.com/microsoft/DirectX-Debugging-Tools/blob/8f427278f656c4f27d51c660230bb22ca0841178/D3DDred-WinDbg/D3DDred.js

But I have many questions.

For example, for better or worse my project has 40+ command lists in the frame. However the node list doesn't seem to have any particular ordering. I'd like to use DRED to report breadcrumbs in an engine independent manner ie. without high-level knowledge of the order in which command lists were submitted to each queue. As it stands I create a hang in one node (aka CL) and describing that failure is fine, but then I have 40 other nodes which have not hit their first breadcrumb because they were submitted later on the same queue, without a breadcrumb at the top of the node, I cannot know if the CL started, nor do I have any information about the ordering on the queue.

Currently I am choosing to ignore any nodes that never hit their first breadcrumb on the assumption that the first operation didn't fail - which is not a reliable strategy.

I am also finding that when I get a DEVICE_REMOVED result while creating a placed resource after hanging the gpu, and then call directly into GetAutoBreadcrumbsOutput immediately after that, pHeadAutoBreadcrumbNode can be null unless I insert some kind of wait. Which in practice means I miss DRED reports somewhat randomly based on the timing of when the DR is detected.

A reference implementation would be very helpful.