Open mibu138 opened 1 year ago
My guess is that llvmpipe is using gdb's "JIT interface" to support debugging of the JITted code and gdb's JITted code support does not work with rr. https://llvm.org/docs/DebuggingJITedCode.html
You could try the Pernosco fork of gdb, which patches gdb to disable that JIT support: https://github.com/Pernosco/binutils-gdb/commits/pernosco-gdb
Interesting from the Julia side I often use rr
with GDB's JIT interface, and sofar I haven't run into issues.
(One of my issues with pernosco has been that it discards the JIT information)
I very much appreciate the fast response. I did an A/B comparison running the same replay that I described above and using the pernosco-gdb version does seem to fix the issue. So that is awesome.
Are there plans to work with gdb's JITted code support eventually? Or is this more of an issue with GDB or LLVM?
@vchuravy that is odd, since it sounds like it should not work based on @rocallahan 's response. I'm guessing Julie is also using LLVM for the JIT code generation?
I'm guessing Julie is also using LLVM for the JIT code generation?
Yes. I often record Julia sessions with ENABLE_GDBLISTENER=1 rr record julia
since that allows for symbolization of backtraces through JITed code. I have also used RR to debug the actual JIT (set a watchpoint to a codepage and reverse execute until the JIT emits that codepage).
Do you know which llvm JIT version llvmpipe
uses? MCJIT, OrcV1 or OrcV2 (RTDYLD or JITLink)?
Julia currently uses OrcV2 with RTDYLD, but we are moving on to JITLink.
It appears they are using MCJIT, based off some grepping of their source tree.
First of all I just want to thank you all very much for creating this incredible tool. It has completely changed the way I debug and I'm very much in the "can't ever go back" camp. Big big thanks. Now onto the question...
I need to debug an appication that uses both OpenGL and Vulkan. Since RR does not appear to play with well with GPU graphics/compute, I run the application using Mesa's llvmpipe as a software based driver. This works well enough in that it makes RR usable, but often times when I am running a
reverse-continue
to the next breakpoint I end up stepping over some driver code and things all of a sudden get very slow. This slowness is accompanied by printouts like this:By slow I mean, I recorded an application running for about 1 and half minutes which ended in a crash. I run
rr replay -e
to get to the end, and then I set a breakpoint at a method that I know to be in the call stack that caused the crash. This call would have occurred within seconds of the crash. I runreverse-continue
. I start seeing printouts like above. I get about 50 of those before I see the messageMaybe a few minutes later I see this message
And i'm back at the gdb prompt. The whole process took about 40 minutes. I never got to the breakpoint.
So, I'm just really not sure what is going on here, other than it seems like the JIT code that is being run by the software driver is messing things up. I have been able to successfully debug issues with rr using this software backend at times but its hit or miss. I'm not sure, but I recall sometimes being able to step over the JIT stuff and come out the other side, but many times it does seem to take things off the rails.
Any idea what might be going on here or tips for working around this? Most of the time I'm not interested in looking at any of the driver code, and would be happy if that could just be ignored by rr.
System info: rr 5.6.0 gdb 12.1 linux kernel 6.2.1 CPU Intel i9-9900K x86_64 archlinux distro, up-to-date as of about 2 weeks ago.