Open sholloway opened 1 year ago
This video gives an overview of how GPU debugging can be done with XCode.
There are two tools at play.
You can use XCode to do a GPU Frame capture. You then break down the scene being rendered into a frame graph. The frame graph is a break down of the order and purpose of each rendering pass. An example of a frame graph is:
A GPU Trace enables seeing all the render passes.
The Instruments tool has profiling templates. use the Game Performance or Metal System Trace to trace the GPU. The Metal System Trace Instruments template provides a frame level inspection tool.
Here are the steps to run an Instrument Trace on a Python WebGPU app.
You should now have a trace to inspect.
Here are the steps to configure XCode to debug a Python project.
A few things to note:
I can launch and start a debugging session of the POC/Obj Loader however XCode will not trace the Metal commands.
The below message is displayed in the terminal output during the Launch target.
[Metal Diagnostics Warning] Application Deployment Target Version (11.0) does not match OS Version (13.5.2) - diagnostics may be missing debug
I've gone round and round trying to set the deployment target settings. I think this is actually baked into the python executable. The below shell snippet outputs that Python was built targeting v11 of macOS and was linked against the v11 SDKs.
otool -l /nix/store/dbmhpvp80aqxlasa8d6a7b5id1ijsz6g-python3-3.11.2-env/bin/python
Load command 0 cmd LC_SEGMENT_64 cmdsize 72 segname PAGEZERO vmaddr 0x0000000000000000 vmsize 0x0000000100000000 fileoff 0 filesize 0 maxprot 0x00000000 initprot 0x00000000 nsects 0 flags 0x0 Load command 1 cmd LC_SEGMENT_64 cmdsize 472 segname TEXT vmaddr 0x0000000100000000 vmsize 0x0000000000004000 fileoff 0 filesize 16384 maxprot 0x00000005 initprot 0x00000005 nsects 5 flags 0x0 Section sectname text segname TEXT addr 0x0000000100003a6c size 0x000000000000005c offset 14956 align 2^2 (4) reloff 0 nreloc 0 flags 0x80000400 reserved1 0 reserved2 0 Section sectname stubs segname TEXT addr 0x0000000100003ac8 size 0x0000000000000018 offset 15048 align 2^2 (4) reloff 0 nreloc 0 flags 0x80000408 reserved1 0 (index into indirect symbol table) reserved2 12 (size of stubs) Section sectname stub_helper segname TEXT addr 0x0000000100003ae0 size 0x0000000000000030 offset 15072 align 2^2 (4) reloff 0 nreloc 0 flags 0x80000400 reserved1 0 reserved2 0 Section sectname cstring segname TEXT addr 0x0000000100003b10 size 0x00000000000004a5 offset 15120 align 2^0 (1) reloff 0 nreloc 0 flags 0x00000002 reserved1 0 reserved2 0 Section sectname unwind_info segname TEXT addr 0x0000000100003fb8 size 0x0000000000000048 offset 16312 align 2^2 (4) reloff 0 nreloc 0 flags 0x00000000 reserved1 0 reserved2 0 Load command 2 cmd LC_SEGMENT_64 cmdsize 152 segname DATA_CONST vmaddr 0x0000000100004000 vmsize 0x0000000000004000 fileoff 16384 filesize 16384 maxprot 0x00000003 initprot 0x00000003 nsects 1 flags 0x10 Section sectname got segname DATA_CONST addr 0x0000000100004000 size 0x0000000000000008 offset 16384 align 2^3 (8) reloff 0 nreloc 0 flags 0x00000006 reserved1 2 (index into indirect symbol table) reserved2 0 Load command 3 cmd LC_SEGMENT_64 cmdsize 232 segname DATA vmaddr 0x0000000100008000 vmsize 0x0000000000004000 fileoff 32768 filesize 16384 maxprot 0x00000003 initprot 0x00000003 nsects 2 flags 0x0 Section sectname la_symbol_ptr segname DATA addr 0x0000000100008000 size 0x0000000000000010 offset 32768 align 2^3 (8) reloff 0 nreloc 0 flags 0x00000007 reserved1 3 (index into indirect symbol table) reserved2 0 Section sectname data segname DATA addr 0x0000000100008010 size 0x0000000000000010 offset 32784 align 2^3 (8) reloff 0 nreloc 0 flags 0x00000000 reserved1 0 reserved2 0 Load command 4 cmd LC_SEGMENT_64 cmdsize 72 segname __LINKEDIT vmaddr 0x000000010000c000 vmsize 0x0000000000004000 fileoff 49152 filesize 2032 maxprot 0x00000001 initprot 0x00000001 nsects 0 flags 0x0 Load command 5 cmd LC_DYLD_INFO_ONLY cmdsize 48 rebase_off 49152 rebase_size 8 bind_off 49160 bind_size 24 weak_bind_off 0 weak_bind_size 0 lazy_bind_off 49184 lazy_bind_size 32 export_off 49216 export_size 64 Load command 6 cmd LC_SYMTAB cmdsize 24 symoff 49288 nsyms 7 stroff 49424 strsize 88 Load command 7 cmd LC_DYSYMTAB cmdsize 80 ilocalsym 0 nlocalsym 1 iextdefsym 1 nextdefsym 3 iundefsym 4 nundefsym 3 tocoff 0 ntoc 0 modtaboff 0 nmodtab 0 extrefsymoff 0 nextrefsyms 0 indirectsymoff 49400 nindirectsyms 5 extreloff 0 nextrel 0 locreloff 0 nlocrel 0 Load command 8 cmd LC_LOAD_DYLINKER cmdsize 32 name /usr/lib/dyld (offset 12) Load command 9 cmd LC_BUILD_VERSION cmdsize 32 platform 1 minos 11.0 sdk 11.0 ntools 1 tool 3 version 609.0 Load command 10 cmd LC_SOURCE_VERSION cmdsize 16 version 0.0 Load command 11 cmd LC_MAIN cmdsize 24 entryoff 14956 stacksize 0 Load command 12 cmd LC_LOAD_DYLIB cmdsize 56 name /usr/lib/libSystem.B.dylib (offset 24) time stamp 2 Wed Dec 31 18:00:02 1969 current version 1292.60.1 compatibility version 1.0.0 Load command 13 cmd LC_RPATH cmdsize 120 path /nix/store/ylxc5aq56jqd19vmbqgpgbyjnjmw9qyd-apple-framework-CoreFoundation-11.0.0/Library/Frameworks (offset 12) Load command 14 cmd LC_FUNCTION_STARTS cmdsize 16 dataoff 49280 datasize 8 Load command 15 cmd LC_DATA_IN_CODE cmdsize 16 dataoff 49288 datasize 0 Load command 16 cmd LC_CODE_SIGNATURE cmdsize 16 dataoff 49520 datasize 1664
XCode's Metal Debugger is not going to work for my needs. The Python executable targets v11 of macOS while the debugger requires 13.5.2. I cannot find a way around this without compiling Python myself. I really don't want to have to fool with that.
Renderdoc has a branch that is working towards macOS support. It may be worth compiling that.
Without proper tool support I'm resorting to attempting to build a shader pipeline for rapid debugging.
Define a ShaderDebugger class that provides an API for easily building a render pipeline that allows the option of:
It would be ideal to have a single pipeline that can handle the three use cases. So if the debugger wants to, it could render vertices and edges and faces all in a single frame.
The actual vert-list primitive type doesn't enable setting the size of vertices. They're only one pixel wide. That said, perhaps I could dynamically create a quad at the vertex location. Geometry shaders are not possible in a WebGPU pipeline. (FYI, they're not supported by Metal and have developed a reputation for being a performance bottleneck.)
The classic work around is using instancing + CPU or Computer Shader. So creating a mesh that represents a vertex and then creating an instance of that at every vertex position is probably the correct approach. See the gpuweb isues 1239 and 332 for a discussions about this.
How does one enable using multiple shaders in a single render frame?
Related Resources
I've got drawing edges working with line lists. Now try to incorporate using barycentric coordinates to control rendering at a pixel level.
The Basic concept is to expand the vertex buffer to include barycentric coordinates for each vertex. In the vertex shader pass it to the fragment shader using the @interpolate annotation. In the fragment shader have logic that determines to draw a line fragment or face fragment based on it's BC location.
Tasks
Related Resources
The Challenge
Establish a methodology for debugging GPU calls.
Summary
WGSL is transpiled into Metal calls. I need a way to see the GPU calls and inspect what shaders are doing.