stephenrkell / liballocs

Meta-level run-time services for Unix processes... a.k.a. dragging Unix into the 1980s
http://humprog.org/~stephen/research/liballocs
Other
216 stars 26 forks source link

TLS allocator is not indexed/queryable #43

Open stephenrkell opened 4 years ago

stephenrkell commented 4 years ago

We are missing support for thread-local storage.

This allocator is implemented inside the dynamic linker and is much like a static allocator, but each thread gets its own segment, for each library defining TLS-storage symbols. We need to create bigallocs for these areas as threads are created, and index them more-or-less as we handle static segments. (There is no point starting this until the deep-static-allocs branch is merged.)

stephenrkell commented 3 years ago

With the latest libsystrap, we can now hook clone() if that's a good way to do this. And we can of course hook arch_prctl(). I'm a bit fuzzy on when new TLS blocks get allocated... could be before or after the clone. But either way, this seems doable.

One quirk is that a TLS block can be logically extended by dlopen(). The block is not reallocated... rather, dynamically loaded modules' regions are discontiguous, and one must indirect through the DTV (a per-thread vector of pointers, one pointer per DSO) to find those thread-local variables. Also remember that the allocation is lazy.

stephenrkell commented 3 years ago

Note that DTVs themselves can be reallocated, and this may happen during dynamic loading.

stephenrkell commented 3 years ago

We can think of a mmap'd chunk participating in TLS as representing a particular range of one or more DTV entries, for a particular thread. Each DTV entry corresponds to the thread-local segment of a single binary. So, supposing we compute static metadata for these just like we do non-thread-locals, but instead of by vaddr it is indexed by offset from the DTV block base. Our per-bigalloc metadata for the chunk will be mostly concerned with recording this sequence (and probably the thread ID / its TCB base address).

We can perhaps use a structure similar to the one we use for allocation sites -- in particular how we group together multiple DSOs' info into a single coherent identity space, despite each having a local indexing scheme.

stephenrkell commented 10 months ago

I wonder if trapping set_thread_area() is a sane way to do this (on Linux, on x86...). It is less hairy than trapping clone().