Open talex5 opened 1 week ago
This does look like a false positive. Or, rather, like a case where our definition of mixed model data races breaks down (see the last paragraph of this section of the manual). @fabbing and I made it so that a non-atomic access in OCaml code can race with an atomic access from the runtime (in this case, a relaxed atomic / volatile read from the GC).
It looks like you just uncovered a legitimate case where such accesses can happen concurrently.
Now, this should not have any real consequences, because in our mental model (for lack of a formalized framework), non-atomic accesses in OCaml can be considered equivalent to relaxed atomics in C. TSan support is a compromise and I’m not sure we can avoid this report without generating false negatives. So yes, my advice would be to probably suppress it.
It's not clear to me why there is a race: line 895 in major_gc.c is
https://github.com/ocaml/ocaml/blob/74a76cc2f2588eb18a443a60918f040838292ac7/runtime/major_gc.c#L895
so it reads block
using the Field
macro, but Field
casts its argument into a volatile value *
before its dereference:
I thought that this volatile value *
cast should suffice to avoid having this read participate in data races. What am I missing?
I thought that this
volatile value *
cast should suffice to avoid having this read participate in data races. What am I missing?
It will not race with atomic writes but can still race with non-atomic ones, which is the case of the other access here. It’s a non-atomic write from OCaml, which is TSan-instrumented as a plain load.
Running an Eio program with 5.2.0-tsan, I get:
My application code is just decrementing a counter in a record. It seems reasonable to me that the GC in another domain might be scanning that field at the same time. I assume this should be suppressed somehow.
The test-case doesn't trigger very reliably for me, but for reference here is the code:
This is using the "mock" backend to simplify things (the mock backend avoids most C stubs and OS interactions).
I did capture an
rr
recording of tsan reporting the warning if that's useful.(original report from @avsm in https://github.com/ocaml-multicore/eio/issues/751)