mstange / framehop

Stack unwinding library in Rust
Apache License 2.0
82 stars 11 forks source link

Infinite loop when profiling Firefox #2

Closed mstange closed 2 years ago

mstange commented 2 years ago

I'm hitting an infinite loop when profiling a local Firefox macOS arm64 build with perfrecord + framehop. Needs to be debugged.

mstange commented 2 years ago

This infinite loop was happening when trying to unwind from pthread_exit which was called by +[NSThread exit].

+[NSThread exit] looks like this:


                     +[NSThread exit]:
00000001813141c4         pacibsp
00000001813141c8         stp        fp, lr, [sp, #-0x10]!
00000001813141cc         mov        fp, sp
00000001813141d0         mov        x0, #0x0                                    ; argument "value_ptr" for method imp___auth_stubs__pthread_exit
00000001813141d4         bl         imp___auth_stubs__pthread_exit              ; pthread_exit

                     -[NSOrthographyCheckingResult range]:
00000001813141d8         adrp       x8, #0x1d93d8000                            ; 0x1d93d8e88@PAGE
00000001813141dc         ldrsw      x8, [x8, #0xe88]                            ; 0x1d93d8e88@PAGEOFF, __MergedGlobals_1d93d8e88
00000001813141e0         add        x8, x0, x8
00000001813141e4         ldp        x0, x1, [x8]
00000001813141e8         ret
                        ; endp

+[NSThread exit] calls pthread_exit, which never returns. There is no next instruction after the bl instruction inside +[NSThread exit]. The next instruction after the bl instruction falls into the next function, which happens to be -[NSOrthographyCheckingResult range]. So, inside pthread_exit, the return address points at the start of that next function. That's fine because this return address will never be followed... except by a stack walker.

During unwinding, we were looking up the unwinding opcode for the return address. This means that, in this case, we were getting the opcode for the wrong function. And the two functions have different unwind information: +[NSThread exit] uses the frame pointer, whereas -[NSOrthographyCheckingResult range] is a frameless function where the return address is stored in lr.

I'm now changing it so that we look up return_address - 1 when trying to find the right unwind info for a return address. This means that we'll correctly look up the opcode for +[NSThread exit]. I'm also adding a check which makes sure that we don't try to follow lr if we're unwinding from a return address, because frameless functions can never call other functions. This defends against an infinite loop of return_address = lr.