Open neon12345 opened 5 years ago
You can't continue past the execve point. To debug after the execve, get the current event number with when
, add some small number to it, and then try rr replay -g <event> -p <pid>
.
I see now that this is done in GdbServer.cc. Would it be possible to do this step automatically from there to get a user experience similar to normal gdb?
Maybe. I'm not sure if the remote agent protocol can handle it.
When I use only gdb and set a breakpoint for the executable after execve, it is possible to step over execve and halt at the breakpoint. Would it not be possible to make a small change to GdbServer.cc to get this behaviour or is there something else? I would try to add it then.
The problem is that gdb talks to rr using the gdb remote protocol. That works differently from gdb just running by itself.
Using gdbserver+gdb has the same behaviour. So it should be possible I guess.
Great!
I have implemented a first version but have to give up now. In theory one has to implement the exec-events extension sending a different stop reply on execve. (This can be found in gdb/gdbserver/remote-utils.c) The register definitions can be found in gdb/gdbserver/x86-tdesc.h and gdb/amd64-tdep.c. Plus advance execution to the next event after the execve and wait for the next cont. This kind of works when running rr replay normally but not in interpreter mode with eclipse.
There are possibly multiple bugs in the gdb communication.
I sometimes get errors from gdb complaining about more bytes received than expected for example when sending registers.
https://github.com/mozilla/rr/issues/2239 also seems to be a problem here and just sending '3' makes gdb happy.
While running in normal rr replay mode I can set a breakpoint after execve and continue to step from there. With eclipse I can see the stop at the breakpoint but at the same time the program continues to execute until the final kill signal.
I guess this is because of the handling of stop signals which should be batched. Meaning that
stop stop cont
should probably be translated to
stop cont + stop
but this is just a guess.
Summary: we don't support gdb executing past execve. You can work around it by digging event numbers out of the trace and doing rr replay -g
gdb might have some feature to debug past execve but I haven't looked into it. We would accept patches if someone figures it out. But even if this can be made to work with gdb somehow, I'm almost certain it won't be able to reverse-execute through an execve, which is one reason I think Pernosco is a much better long-term approach than trying to squeeze a little bit more functionality out of gdb.
If someone does want to work on this, this issue is where we will discuss that.
With our work on shared memory recording, we created a method to record our executables individually and bypass the execve issue.
FTR, in the meantime I added an entry about execve to FAQ, hopefully it will help if someone potentially stumbles upon it.
I get a "Program stopped" (Suspended: Signal : 0:Signal 0) on execve calls in rr replays and a continue results in a stop loop with no progress. I was unable to find out how this stop signal is generated. A normal gdb run of the program without rr works fine.