upenn-acg / ocolos-public

Ocolos is the first online code layout optimization system for unmodified applications written in unmanaged languages.
BSD 2-Clause "Simplified" License
49 stars 14 forks source link

The method of debugging replace_function.so with gdb #4

Closed onroadmuwl closed 1 year ago

onroadmuwl commented 1 year ago

I am trying to understand the detailed process of ocolos's code replacement, but I am unable to reach the insert_machine_code() function through GDB. Would you like to guide me through the general steps of debugging replace_function.so? Here's what I've tried so far:

  1. I added -DDEBUG to the CPPFLAGS and ran the tracer program. The program ran successfully, and I received the following output:

[tracer] thread id = 1236504, rip = 7f8dd628b99f [tracer] before SINGLESTEP, set RIP = 7f8dd6b0be1c (lib addr) [tracer] receive SIGSTOP from tracee (lib code), tracee finished a SINGLESTEP! [tracer] after SINGLESTEP, RIP = 7f8dd6b0be1b

[tracer] thread id = 1236508, rip = 7f8dd629173d [tracer] before SINGLESTEP, set RIP = 7f8dd6b0be1c (lib addr) [tracer] receive SIGSTOP from tracee (lib code), tracee finished a SINGLESTEP! [tracer] after SINGLESTEP, RIP = 7f8dd6b0be20 [tracer] after a PTRACE_SINGLESTEP, do a PTRACE_CONT [tracer] connection from 127.0.0.1 [tracer] after PTRACE_CONT, tracee delivers a signal Stopped (signal) [tracer] RIP = 7f8dd66cd289 [tracer] machine code insertion finishes! [tracer][time] machine code insertion took 2.071340 seconds to execute [tracer][OK] code replacement done!

  1. Next, I tried to attach the thread using "gdb attach 1236508" and set a breakpoint using "b insert_machine_code". The output I received was "Program received signal SIGSTOP, Stopped (signal)."

(gdb) b insert_machine_code Breakpoint 1 at 0x7f8dd6b0be1c (gdb) continue Continuing.

Program received signal SIGSTOP, Stopped (signal). syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38 38 in ../sysdeps/unix/sysv/linux/x86_64/syscall.S

  1. Lastly, I tried the following methods to resolve the issue, but GDB did not output anything for me:

    (gdb) b insert_machine_code Breakpoint 1 at 0x7f8dd6b0be1c (gdb) handle SIGSTOP nopass Signal Stop Print Pass to program Description SIGSTOP Yes Yes No Stopped (signal) (gdb) handle SIGSTOP nostop Signal Stop Print Pass to program Description SIGSTOP No Yes No Stopped (signal) (gdb) continue Continuing.

(gdb) b insert_machine_code Breakpoint 1 at 0x7f8dd6b0be1c (gdb) shell kill -CONT 1236504 (gdb) continue Continuing.

May I ask you how you usually debug and understand the code inside “replace_function.so”?

zyuxuan0115 commented 1 year ago

We don't use gdb to debug replace_function.so. This debug option -DDEBUG is used to debug mysqld after the code replacement is done by Ocolos. We developed this debug feature to check whether Ocolos injected machine code to mysqld's text section correctly. So the correct steps for -DDEBUG are: after gdb is attached to mysqld, set a breakpoint in mysqld's code rather than replace_function.so's code, and then let mysqld run.

This is how replace_function.so works:

So it is expected to see replace_function.so reports "Program received signal SIGSTOP, Stopped (signal)", because by the time you attached gdb to that thread, the code in replace_function.so had already finished execution.

To debug code in replace_function.so, my suggestion is to redirect the debug message to a file. After replace_function.so finishes its execution, you can check that file. There is a file called machine_code.txt in your tmp_data_dir. It is the log file of replace_function.so. It is now used for recording all machine code we've inserted to mysqld. You can also write other information to that file or create your own file.

onroadmuwl commented 1 year ago

It’s clear to me now, thank you for your prompt and insightful response.