Issue with traversing the stack

andreaspirklbauer commented 4 years ago

TrapBacktrace scans the stack starting from SP until it hist stackOrg. For each position sp it checks whether it is a valid address in the module space, i.e. whether it is in fact a pushed LNK register. But what if any parameter passed to a subroutine happens to look just like such an address. Doesn't then System.Backtrace falsely believe it is an address of a procedure in the module space, when in fact it is just the value of a parameter?

schierlm commented 4 years ago

Yes, it can happen that you get false stack frames in the middle. But they are usually easy to identify as a user (there is no path from the next stack frame to reach the point where the false stack frame starts). And the chance is reduced as the code checks whether the instruction before the return address is actually a jump.

This is the same when creating kernel backtraces in a well-known proprietary operating system (e.g. from a bluescreen), in the likely case that you do not have debug symbol of all the involved modules.

Still, having a backtrace with sometimes false stack frames (the real stack frames are of course there, too) is better than having no backtrace at all.

If you have any suggestions how to improve the detection of false positives, feel free to share them (For example, one could check whether the target address of the previous jump is in the same module as the next jump, or possible even preceding the jump itself. But I'm not sure if that might not break some backtraces - e.g. when allocating memory via Kernel.New trap).

andreaspirklbauer commented 4 years ago

Yes, the check on the jump instruction before the return address will reduce the likelihood of false positives, but not completely eliminate it. One "safe" way to address the issue would be to add back additional meta data, either on file or in the run-time data structure itself. The easiest way would be to simply re-introduce the dynamic link chain within the stack frames, as in Lilith Modula-2 or Ceres Oberon for example. But the dynamic (and also the static) link chains have been eliminated for good reason in FPGA Oberon 2013, so this would be a step backward and is not recommended. If one does not want to increase the size of the stack frame, another way to (effectively) add back the dynamic link chain would be to "abuse" the higher order bits of the pushed LNK field in the stack frame to encode the dynamic link offset to the previous frame. But that would be a rather dirty trick, which would make procedure prolog and epilog more complicated and only work on systems with small address spaces. Thus, not recommended either. In the absence of adding back meta data, one can only apply some heuristics.

schierlm / Oberon2013Modifications

Issue with traversing the stack #2