omlins / ParallelStencil.jl

Package for writing high-level code for parallel high-performance stencil computations that can be deployed on both GPUs and CPUs
BSD 3-Clause "New" or "Revised" License
311 stars 31 forks source link

Debugging and profiling workflow #80

Open smartalecH opened 1 year ago

smartalecH commented 1 year ago

Is there a recommended workflow when trying to debug and profile @parallel_indices functions?

Often, my functions reference several other functions (which typically act as function barriers). If I want to profile these nested functions, or even the parent function, I run into various problems due to the way the macro extracts out code.

For example, it would be great if I could use something like Cthulhu.jl to recursively evaluate the @code_warntype output (and check for dynamic dispatching etc.) but this currently isn't possible. More importantly, none of the current Julia debuggers can traverse into any of the kernels. In other words, I can't seem to find a way to inspect the current running state of the kernel (which is also important for checking for dynamic dispatch and specialization errors).

I could copy the body contents into a "custom" function that implements a manual loop over the indices... but this seems a little too brute force.

albert-de-montserrat commented 1 year ago

Having Cthulhu.jl working would be ideal, but @profview works and should be able to catch dynamic dispatching.

omlins commented 1 year ago

Have you tried with visual studio code? As far as I know most of the focus for julia tool development goes in there. If yes, where do we get stuck? It is clear that the debugging of GPU kernels it's still not easy - independently of ParallelStencil. However, most of the time you can make your code work on a single thread and it will just work when you run it on the GPU.

smartalecH commented 1 year ago

but @profview works and should be able to catch dynamic dispatching.

Yup, you're right. Unfortunately, @profview doesn't give very much detail regarding where in your code you are dispatching dynamically. For example, it will tell you which functions are doing so, but not where in that function.

Have you tried with visual studio code? As far as I know most of the focus for julia tool development goes in there. If yes, where do we get stuck?

Yes, I primarily work with VSCode. The tools work rather well! However, the debugger is unable to enter into any of the kernel functions (or functions they call). Even on single-thread CPU computations.

This is particularly important when trying to identify the source of dynamic dispatch (mentioned above) or if there are any specialization issues. @profview is kind enough to tell me I have a dispatch problem, but I can't step through the kernel to see why.

Enabling any debugger to work with @parallel_indices would dramatically simplify things!