Open kkysen opened 1 month ago
Do you have any way to get this crashing build again? Can I reproduce it? I'm having a hard time getting this to reproduce in a smaller test case. I'm only getting the compiler to generate the following sequence, instead of the hardcoded TLS offset we saw.
3887: 64 48 8b 04 25 00 00 mov %fs:0x0,%rax
388e: 00 00
3890: 48 03 05 69 5b 00 00 add 0x5b69(%rip),%rax # 9400 <ia2_stackptr_0@@Base+0x7400>
3897: 48 89 18 mov %rbx,(%rax)
If I recall correctly we were seeing basically %fs
+ fixed offset instead of fetching the global containing the offset.
Do you have any way to get this crashing build again? Can I reproduce it?
It still reproduces the same for me when I have ia2 on the latest main
(without #458) and build dav1d
with ./rewrite.py
(only dependency should be uv
) in immunant/dav1d/ia2
.
I'm having a hard time getting this to reproduce in a smaller test case. I'm only getting the compiler to generate the following sequence, instead of the hardcoded TLS offset we saw.
3887: 64 48 8b 04 25 00 00 mov %fs:0x0,%rax 388e: 00 00 3890: 48 03 05 69 5b 00 00 add 0x5b69(%rip),%rax # 9400 <ia2_stackptr_0@@Base+0x7400> 3897: 48 89 18 mov %rbx,(%rax)
If I recall correctly we were seeing basically
%fs
+ fixed offset instead of fetching the global containing the offset.
This is the faulting instruction for me:
gdb
:
0x555555558ed7 <init_stacks_and_setup_tls+600>: mov %rax,%fs:0xffffffffffffe000
llvm-objdump
:
4ed7: 64 48 89 04 25 00 e0 ff ff movq %rax, %fs:-0x2000
The -0x2000
is the fixed offset, right?
After compartmentalizing
dav1d
and trying to run it, I started hitting a segfault at the very beginning, ininit_stacks_and_setup_tls
: https://github.com/immunant/IA2-Phase2/blob/94f890bf02fad5bc84c3b0a417fb33b654c7268a/runtime/libia2/include/ia2_internal.h#L433This segfault wasn't initially happening, but like #455 and #456, started happening suddenly.
After much debugging with @rinon, we determined that the main compartment's DSO, which calls
INIT_RUNTIME
and thus definesinit_stacks_and_setup_tls
, was calculating a different address for the thread-localia2_stackptr_0
as compared to other DSOs likeliblibia2.a
.As @ahomescu explained to @rinon,
We were able to workaround this by moving the assignment in https://github.com/immunant/IA2-Phase2/blob/94f890bf02fad5bc84c3b0a417fb33b654c7268a/runtime/libia2/include/ia2_internal.h#L433 to a separate function,
allocate_stack_0
, defined inlibia2
, thus preventing it from trying to use this "Local Exec" TLS model. While a valid workaround, this is just a workaround, as the root issue is still there, and I wonder if it may also be causing issues like #455 and #456.