Open vargaz opened 5 years ago
In addition to that it would be nice to get rid of in_critical_region
in SgenClientThreadInfo
that is a set
(+memory barrier) and unset
(+memory barrier) in the allocator for each allocation.
set
:
https://github.com/mono/mono/blob/fd82c6036ec3720d568050c1e538f9729cef984e/mono/metadata/sgen-mono-ilgen.c#L348-L354
unset
on slowpath before calling into runtime code:
https://github.com/mono/mono/blob/fd82c6036ec3720d568050c1e538f9729cef984e/mono/metadata/sgen-mono-ilgen.c#L425-L431
unset
on fastpath merged with publish of allocated object. We probably can't get rid of that completely:
https://github.com/mono/mono/blob/fd82c6036ec3720d568050c1e538f9729cef984e/mono/metadata/sgen-mono-ilgen.c#L502-L514
I was thinking that in_critical_region
is not necessary anymore with hybrid/coop suspend, but I'm not 100% sure on that yet.
Note that the membarrier
syscall may not be available on older systems. This is likely to affect some of the ARM hardware that you are running. Additionally, it is not available on Android devices until Android 10 (https://github.com/aosp-mirror/platform_bionic/commit/6ba6694ec24052de9fdfa8a51c148aebe68a9fd0#diff-dc7191156f6b327ad0d4bae8f0e1a803) and it is explicitly blocked by SELinux on older versions even if the kernel supports it. liburcu has some alternatives for systems where membarrier
is problematic but the performance penalty could be too high.
I'm pretty sure we already make the assumption that when a thread is suspended, the suspending mechanism makes sure to have all the memory writes flushed. Otherwise card table marking would have never worked.
I don't see any memory barriers when storing in_critical_region (MONO_MEMORY_BARRIER_NONE!?). I agree that when using coop it is of no use. The only memory barrier that I'm aware of inside the allocator is the one used to publish the object contents, before returning it, which is probably needed only because outside code might be juggling with these references in a lock free fashion. I think this was the memory model that we used to provide, if you juggled with objects between threads in a lock free fashion we wouldn't internally crash due to invalid vtable, but that was it. Maybe we were doing this with objects also inside the runtime.
Can mono_handle_new run concurrently with a GC ? I'm guessing not and that those membars are not needed. However, if the thread enters GC Safe state, and the GC can just kick in at any time, there should be a memory barrier to flush all writes to the handle stack.
Can mono_handle_new run concurrently with a GC ?
(With preemptive suspend, I assume)
The code was meant to be resilient to being suspended at any point during mono_handle_new
, so that when the GC is scanning the handle stack it either sees valid object references or handles with a NULL, never junk integers. But I assumed that if we're scanning the handle stack the thread is suspended at that point (ie we scan during a STW).
Although now that we switched to scanning the handle stack conservatively, it's less important that there are non-junk values there.
However, if the thread enters GC Safe state, and the GC can just kick in at any time, there should be a memory barrier to flush all writes to the handle stack.
They shouldn't be calling mono_handle_new
from a GC Safe state. They need to switch to unsafe first (which will safepoint).
Well, I assume only registered mutator threads should ever push to the handle stack, and all mutator threads are suspended during GC. So this means it can't run concurrently with a GC. And if the thread is suspended, based on what I said above, the memory of the suspended thread should have been flushed, regardless of those memory barriers.
Ok, that makes sense. In that case the barriers in mono_handle_new are not needed.
the suspending mechanism makes sure to have all the memory writes flushed.
So on mach and win32 we assume that suspend_thread
and SuspendThread/GetThreadContext
do that? And if we're using signals for suspend then signal delivery ought to do it? or should we do something explicitly in the signal handler?
When using signals, we synchronize with some semaphores so we are ok. On mach and win32 we probably just assume, I don't think there is explicit documentation there, but it makes sense to be that way and we would have run into trouble long ago if it weren't the case.
The code still needs at least compiler memory barriers. Also, I think the current barriers in the allocators etc. were added because of failures, I don't remember whenever the failures were due to compiler or cpu reordering.
We could replace membars from handles code with compiler barriers, just to be sure.
For allocators, it's risky to remove membarriers altogether because I'm sure there are places in the runtime where we might pass around such objects without synchronisation. Allocators can't be interrupted by stw so it had to be cpu reordering.
Right now, our gc related code, i.e. allocators/handles are full of memory barriers in order to support preempt mode. Instead of these, we could use compiler memory barriers (volatile), and have the GC thread do a process wide memory barrier: https://lore.kernel.org/patchwork/patch/188708/ https://docs.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-flushprocesswritebuffers
This will speed up and simplify the implementation of that code.