rr-debugger / rr

Record and Replay Framework
http://rr-project.org/
Other
9.18k stars 586 forks source link

Support executing past execve with gdb #2381

Open neon12345 opened 5 years ago

neon12345 commented 5 years ago

I get a "Program stopped" (Suspended: Signal : 0:Signal 0) on execve calls in rr replays and a continue results in a stop loop with no progress. I was unable to find out how this stop signal is generated. A normal gdb run of the program without rr works fine.

rocallahan commented 5 years ago

You can't continue past the execve point. To debug after the execve, get the current event number with when, add some small number to it, and then try rr replay -g <event> -p <pid>.

neon12345 commented 5 years ago

I see now that this is done in GdbServer.cc. Would it be possible to do this step automatically from there to get a user experience similar to normal gdb?

rocallahan commented 5 years ago

Maybe. I'm not sure if the remote agent protocol can handle it.

neon12345 commented 5 years ago

When I use only gdb and set a breakpoint for the executable after execve, it is possible to step over execve and halt at the breakpoint. Would it not be possible to make a small change to GdbServer.cc to get this behaviour or is there something else? I would try to add it then.

rocallahan commented 5 years ago

The problem is that gdb talks to rr using the gdb remote protocol. That works differently from gdb just running by itself.

neon12345 commented 5 years ago

Using gdbserver+gdb has the same behaviour. So it should be possible I guess.

rocallahan commented 5 years ago

Great!

neon12345 commented 5 years ago

I have implemented a first version but have to give up now. In theory one has to implement the exec-events extension sending a different stop reply on execve. (This can be found in gdb/gdbserver/remote-utils.c) The register definitions can be found in gdb/gdbserver/x86-tdesc.h and gdb/amd64-tdep.c. Plus advance execution to the next event after the execve and wait for the next cont. This kind of works when running rr replay normally but not in interpreter mode with eclipse.

There are possibly multiple bugs in the gdb communication.

  1. I sometimes get errors from gdb complaining about more bytes received than expected for example when sending registers.

  2. https://github.com/mozilla/rr/issues/2239 also seems to be a problem here and just sending '3' makes gdb happy.

  3. While running in normal rr replay mode I can set a breakpoint after execve and continue to step from there. With eclipse I can see the stop at the breakpoint but at the same time the program continues to execute until the final kill signal.

I guess this is because of the handling of stop signals which should be batched. Meaning that

stop stop cont

should probably be translated to

stop cont + stop

but this is just a guess.

rocallahan commented 3 years ago

Summary: we don't support gdb executing past execve. You can work around it by digging event numbers out of the trace and doing rr replay -g .

gdb might have some feature to debug past execve but I haven't looked into it. We would accept patches if someone figures it out. But even if this can be made to work with gdb somehow, I'm almost certain it won't be able to reverse-execute through an execve, which is one reason I think Pernosco is a much better long-term approach than trying to squeeze a little bit more functionality out of gdb.

If someone does want to work on this, this issue is where we will discuss that.

neon12345 commented 3 years ago

With our work on shared memory recording, we created a method to record our executables individually and bypass the execve issue.

Hi-Angel commented 1 year ago

FTR, in the meantime I added an entry about execve to FAQ, hopefully it will help if someone potentially stumbles upon it.