Open brenns10 opened 1 year ago
I like your option (a) because in general, when stack unwinding hits a FaultError
, it stops and returns what it got so far; this would be an extension of that to the first frame. I don't love the idea of checking task.stack
directly though, since I'm also not confident about task lifetimes (and it could be racy). I think it'd be better to catch the FaultError
/DRGN_ERROR_FAULT
at the source (in this case probably linux_kernel_get_initial_registers_x86_64
), although that might require some refactoring.
Sounds good to me, I'll take a look at what sort of refactor it would be.
Actually I don't think it'll take much refactoring. Let's just make drgn_get_initial_registers()
(via all of the arch-specific callbacks that implement it) return a NULL
struct drgn_register_state *
if there are no initial registers. Then each callback can decide what constitutes a hard error vs. a lack of a stack. I.e.,
struct drgn_register_state *regs;
err = drgn_get_initial_registers(..., ®s);
if (err)
// handle hard error.
if (!regs)
// return empty stack trace
// Handle common case
I'm not actually sure what to do with this or even if it's a bug. But we encountered a FaultError inside of the
Program.stack_trace()
function:I checked it out with crash and noticed it's a zombie:
The sp address shown there is the same one that drgn faults on, so I'm guessing the logic from drgn is to get the registers from the thread_info and backtrace right from there. But if you look at the
task_struct.stack
, it's NULL, so I'd guess the stack has been freed and unmapped, and the task struct is sitting around waiting to be reaped.I think a
FaultError
with a random-looking kernel address isn't exactly the best way to handle this, since it's confusing. I think that it could be safely avoided just by testing thetask_struct.stack
field. (But I'm not an expert in task lifetime...) If we caught the issue before reading the stack pointer, then we could handle it better. I guess the options are (a) a stack trace which is just an empty iterable, or (b) a different exception likeValueError
that explains that the stack is missing -- or maybe just a FaultError but with an updated message that indicates the possibility of a zombie? Or I guess (c) returning anOptional[StackTrace]
but that's just gross and ruins the common case.I'm leaning towards (b) but probably could be convinced of (a). Just wanted to ask about it before I try my hand at a PR.