Support for #[thread_local]?

jsgf commented 3 months ago

It would be useful to be able to use #[thread_local] to have functionally static variables which are per-core on rp2040.

Right now you can (with the nightly feature enabled) add the #[thread_local] to a static variable, but it fails to build for several reasons:

the linker script doesn't handle the thread-local sections properly
the compiler generates calls to __aeabi_read_tp which does not exist

The first is simple, but the second raises some interesting questions. __aeabi_read_tp seems to be the ABI's interface with the platform to get the per-"thread" base. It seems that commonly a coprocessor control register is used, but picolib uses the SIO CPUID register on rp2040 to index into an array of per-core pointers. This seems like a reasonable mechanism.

The Raspberry Pi C/C++ SDK doesn't seem to support this at all, so offers no guidance.

Whatever the mechanism, the rp2040_hal::multicore APIs would need to be extended to expose how much per-thread memory is needed, to allow callers to provide suitable memory when spawning on a new core.

9names commented 3 months ago

It would be useful to be able to use #[thread_local] to have functionally static variables which are per-core on rp2040.

I'm having trouble coming up with a use-case where thread_local variables would provide a benefit, what did you have in mind?

jsgf commented 3 months ago

I've been using @cbiffle's lilos executor for my projects. It's strictly single core but there's no reason why you couldn't run two instances, one on each rp2040 core. It relies on static vars for some of its state so I was thinking that thread local would allow two instances to coexist in ram while sharing code.

Otherwise I'd have to work out how to link in two instances with their own statics which would be annoyingly redundant.

This specific case doesn't seem all that niche, esp since 'static lifetime data is so useful for DMA and such.

jsgf commented 3 months ago

Whatever the mechanism, the rp2040_hal::multicore APIs would need to be extended to expose how much per-thread memory is needed, to allow callers to provide suitable memory when spawning on a new core.

Actually thinking about it, we could just statically allocate bss spaceat link time for the per-thread of for each core, and make this completely transparent to users.

rp-rs / rp-hal

Support for #[thread_local]? #793