Open dnadlinger opened 10 years ago
@ibuclaw: This is going to be an issue for you guys too. @MartinNowak: FYI, even though DMD seems to always call __tls_get_addr
for each load/store, so its not an issue there.
That gcc bug is old, I can bump it and cross-reference it back to llvm if you like.
I'm not so fussed about a compiler flag controlling behavior because you can just default it to 'on'.
I noticed while working on OS X merge-2.067 that core.thread Fiber runShared unittest still crashes with release builds because the code path that uses pthread_getspecific(sm_this)
is not enabled for version OSX. Can we fix that locally as part of merge-2.067 activity then later push upstream? It will allow us to get a clean test on OS X.
Confirmed that this also affects PPC64 too.
We now have core.volatile, so I wonder if the sm_this accesses could be done through volatileLoad. So far the only workaround that works 3/4s of the time is to disable inlining of switchIn, switchOut, and getThis. Setting optimization level of these (and maybe some others) functions to -O0 has a 100% success rate, but that is a non-standard extension.
core.thread
fibers are supposed to be safe to migrate across threads, i.e. resume them from a different thread than theyyield
ed from. There is also a unit test to verify that this works.However, there is a problem: LLVM currently (version 3.4/3.5 SVN) always assumes that the address of thread-local variables don't change while on a given stack. This assumption hasn't really caused any problems with non-PIC code so far: TLS accesses are made by addressing off
fs
,gs
or so in most Windows/Linux x86 ABIs, and as a result there is nothing to gain by caching. However, when using the ELF general dynamic TLS model (default for PIC on Linux), the address is determined via a call to__tls_get_addr
. If there now are multiple accesses to a TLS variable in the same function, LLVM indeed caches the result of that call on the stack. This is a problem if some code in between those accesses ends up callingFiber.yield
, as the thread the code is running on, and thus the address of the TLS symbol, might have changed in the meantime.In consequence, migrating fibers between threads currently is an unsafe thing to do if the code they execute relies on TLS at all. See
core.thread.Fiber.switchOut
for an example of how this play out – in user code, the context switch can be hidden behind many layers of calls, of course.This is not a bug in druntime, but poses a problem for pretty much all userspace fiber/coroutine/green-thread implementations out there. The correct solution is to have an option in the compiler backend to disable caching of TLS addresses across (opaque) function calls. MSVC has a switch for this, while the GCC devs refuse to acknowledge the problem altogether. The LLVM/Clang devs are aware of the issue, although I don't think I agree with the proposed solution. This should likely be handled by just providing an option to force the target-specific lowering code not to keep the address across function calls.
Note that this is not an issue on the IR level. If the variable is read two times, the final IR contains two loads from the TLS variable, just as it is supposed to. I'm not sure yet which parts of the target lowering layer handle generate the address lookup code, but the problem needs to be addressed there.
In the meantime, a workaround would be to emit all TLS address explicitly via a
noinline
function (like in the above druntime snippet, but emitted by the compiler). This would obviously be problematic in terms of performance, but we could probably use that code for implementing TLS emulation on Android as well.