loongson-community / discussions

Cross-community issue tracker & discussions / 跨社区工单追踪 & 讨论场所
7 stars 0 forks source link

[ABI] Tracking issue for TLS descriptors (TLSDESC) on LoongArch #20

Open xen0n opened 7 months ago

xen0n commented 7 months ago
xen0n commented 7 months ago

cc @MQ-mengqing @heiher @xry111 @MaskRay

xry111 commented 7 months ago

Issues to be resolved (IMO):

xry111 commented 7 months ago
movcf2gr $t0,$fcc0
movcf2gr $t1,$fcc1
bstrins.w $t0,$t1,1,1
movcf2gr $t1,$fcc2
bstrins.w $t0,$t1,2,2
# ...
st.d $t0,$sp,OFFSET_FCC

@xen0n: How did you handle this for in-kernel FPU usage? The situation is very similar to a context switch (as Florian Weimer said).

xen0n commented 7 months ago
movcf2gr $t0,$fcc0
movcf2gr $t1,$fcc1
bstrins.w $t0,$t1,1,1
movcf2gr $t1,$fcc2
bstrins.w $t0,$t1,2,2
# ...
st.d $t0,$sp,OFFSET_FCC

@xen0n: How did you handle this for in-kernel FPU usage? The situation is very similar to a context switch (as Florian Weimer said).

The kernel just does the equivalent of a FP context switch when entering/exiting in-kernel FPU critical sections.

xry111 commented 7 months ago

st.d $t0,$sp,OFFSET_FCC

This should be "st.b" to be optimal.

MQ-mengqing commented 7 months ago

Seems the second point in [1] break the viewpoint in [2]. I noticed that the mold author Rui said "I'd stick with the usual two-slot design". They who prefer 2-slot design raised enough reasons. And I don't how will musl implement it is acceptable. I'll bring my question up in the coming internal meeting.

[1] https://sourceware.org/pipermail/binutils/2023-December/130916.html [2] https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/373#issuecomment-1668982387

xry111 commented 7 months ago

Seems the second point in [1] break the viewpoint in [2]. I noticed that the mold author Rui said "I'd stick with the usual two-slot design". They who prefer 2-slot design raised enough reasons. And I don't how will musl implement it is acceptable. I'll bring my question up in the coming internal meeting.

[1] https://sourceware.org/pipermail/binutils/2023-December/130916.html [2] riscv-non-isa/riscv-elf-psabi-doc#373 (comment)

Hmm, aren't [1] using the two-slot layout?

Para 2 in [1] says "When using multiple ways to access the same TLS variable, a maximum of 5 GOT slots are used." But only 2 slots are used for DESC, the other slots are used by GD or IE.

MQ-mengqing commented 7 months ago

Para 2 in [1] says "When using multiple ways to access the same TLS variable, a maximum of 5 GOT slots are used." But only 2 slots are used for DESC, the other slots are used by GD or IE.

My misunderstanding is that, 4-slot is the second DESC slot is used to point to the GD two slots (for dynamic TLS), and 2-slot is only one of GD and DESC can exist, then they both use 2-slot. I'm confused about TLS. I need research it.

xry111 commented 7 months ago
  • The slow path of _dl_tlsdesc_dynamic calls __tls_get_addr, which in turn calls malloc (unless statically linked) and malloc may be interposed. An interposed malloc may clobber fcc register, so we either need to save/restore all fcc in the slow path, or tell the compiler a TLS descriptor usage may clobber the fcc registers. Which is better?

FWIW: AArch64 uses a clobber in the compiler.