FaultError in stack_trace()

brenns10 commented 1 year ago

I'm not actually sure what to do with this or even if it's a bug. But we encountered a FaultError inside of the Program.stack_trace() function:

>>> prog.stack_trace(125750)
Traceback (most recent call last):
  File "<console>", line 1, in <module>
_drgn.FaultError: could not read memory from kdump: Cannot get page I/O address: Page not present: pte[427] = 0x0: 0xffffb54ec85abdb8

I checked it out with crash and noticed it's a zombie:

crash7latest> ps 125750
   PID    PPID  CPU       TASK        ST  %MEM     VSZ    RSS  COMM
  125750  13237   2  ffff9bce7d77af80  ZO   0.0       0      0  xagt
crash7latest> struct task_struct ffff9bce7d77af80 | grep 'sp ='
  sas_ss_sp = 0,
    sp = 18446661948706307512,
crash7latest> eval 18446661948706307512
hexadecimal: ffffb54ec85abdb8
    decimal: 18446661948706307512  (-82125003244104)
      octal: 1777775524731026536670
     binary: 1111111111111111101101010100111011001000010110101011110110111000

The sp address shown there is the same one that drgn faults on, so I'm guessing the logic from drgn is to get the registers from the thread_info and backtrace right from there. But if you look at the task_struct.stack, it's NULL, so I'd guess the stack has been freed and unmapped, and the task struct is sitting around waiting to be reaped.

crash7latest> struct task_struct ffff9bce7d77af80 | grep 'stack'
  stack = 0x0,
  stack_canary = 6688123023492870656,
  curr_ret_stack = -1,
  ret_stack = 0x0,
  stack_vm_area = 0x0,
  stack_refcount = {

I think a FaultError with a random-looking kernel address isn't exactly the best way to handle this, since it's confusing. I think that it could be safely avoided just by testing the task_struct.stack field. (But I'm not an expert in task lifetime...) If we caught the issue before reading the stack pointer, then we could handle it better. I guess the options are (a) a stack trace which is just an empty iterable, or (b) a different exception like ValueError that explains that the stack is missing -- or maybe just a FaultError but with an updated message that indicates the possibility of a zombie? Or I guess (c) returning an Optional[StackTrace] but that's just gross and ruins the common case.

I'm leaning towards (b) but probably could be convinced of (a). Just wanted to ask about it before I try my hand at a PR.

osandov commented 1 year ago

I like your option (a) because in general, when stack unwinding hits a FaultError, it stops and returns what it got so far; this would be an extension of that to the first frame. I don't love the idea of checking task.stack directly though, since I'm also not confident about task lifetimes (and it could be racy). I think it'd be better to catch the FaultError/DRGN_ERROR_FAULT at the source (in this case probably linux_kernel_get_initial_registers_x86_64), although that might require some refactoring.

brenns10 commented 1 year ago

Sounds good to me, I'll take a look at what sort of refactor it would be.

osandov commented 1 year ago

Actually I don't think it'll take much refactoring. Let's just make drgn_get_initial_registers() (via all of the arch-specific callbacks that implement it) return a NULL struct drgn_register_state * if there are no initial registers. Then each callback can decide what constitutes a hard error vs. a lack of a stack. I.e.,

struct drgn_register_state *regs;
err = drgn_get_initial_registers(..., &regs);
if (err)
    // handle hard error.
if (!regs)
    // return empty stack trace
// Handle common case

osandov / drgn

FaultError in stack_trace() #273