immunant / IA2-Phase2

5 stars 0 forks source link

"Local Exec" TLS model is used sometimes, resulting in different TLS addresses calculated and thus segfaults #457

Open kkysen opened 1 month ago

kkysen commented 1 month ago

After compartmentalizing dav1d and trying to run it, I started hitting a segfault at the very beginning, in init_stacks_and_setup_tls: https://github.com/immunant/IA2-Phase2/blob/94f890bf02fad5bc84c3b0a417fb33b654c7268a/runtime/libia2/include/ia2_internal.h#L433

This segfault wasn't initially happening, but like #455 and #456, started happening suddenly.

After much debugging with @rinon, we determined that the main compartment's DSO, which calls INIT_RUNTIME and thus defines init_stacks_and_setup_tls, was calculating a different address for the thread-local ia2_stackptr_0 as compared to other DSOs like liblibia2.a.

As @ahomescu explained to @rinon,

there are multiple TLS models, and we seem to be using conflicting models. We're getting a "Local Exec" model reference in the main binary to a thread local that is globally visible and all other references to it go through the GOT and have a different TL offset (so a different address for the thread local value)

We were able to workaround this by moving the assignment in https://github.com/immunant/IA2-Phase2/blob/94f890bf02fad5bc84c3b0a417fb33b654c7268a/runtime/libia2/include/ia2_internal.h#L433 to a separate function, allocate_stack_0, defined in libia2, thus preventing it from trying to use this "Local Exec" TLS model. While a valid workaround, this is just a workaround, as the root issue is still there, and I wonder if it may also be causing issues like #455 and #456.

rinon commented 3 weeks ago

Do you have any way to get this crashing build again? Can I reproduce it? I'm having a hard time getting this to reproduce in a smaller test case. I'm only getting the compiler to generate the following sequence, instead of the hardcoded TLS offset we saw.

    3887:       64 48 8b 04 25 00 00    mov    %fs:0x0,%rax
    388e:       00 00
    3890:       48 03 05 69 5b 00 00    add    0x5b69(%rip),%rax        # 9400 <ia2_stackptr_0@@Base+0x7400>
    3897:       48 89 18                mov    %rbx,(%rax)

If I recall correctly we were seeing basically %fs + fixed offset instead of fetching the global containing the offset.

kkysen commented 3 weeks ago

Do you have any way to get this crashing build again? Can I reproduce it?

It still reproduces the same for me when I have ia2 on the latest main (without #458) and build dav1d with ./rewrite.py (only dependency should be uv) in immunant/dav1d/ia2.

I'm having a hard time getting this to reproduce in a smaller test case. I'm only getting the compiler to generate the following sequence, instead of the hardcoded TLS offset we saw.

    3887:       64 48 8b 04 25 00 00    mov    %fs:0x0,%rax
    388e:       00 00
    3890:       48 03 05 69 5b 00 00    add    0x5b69(%rip),%rax        # 9400 <ia2_stackptr_0@@Base+0x7400>
    3897:       48 89 18                mov    %rbx,(%rax)

If I recall correctly we were seeing basically %fs + fixed offset instead of fetching the global containing the offset.

This is the faulting instruction for me:

gdb:

0x555555558ed7 <init_stacks_and_setup_tls+600>:      mov    %rax,%fs:0xffffffffffffe000

llvm-objdump:

4ed7: 64 48 89 04 25 00 e0 ff ff    movq    %rax, %fs:-0x2000

The -0x2000 is the fixed offset, right?