Closed OlivierNicole closed 1 year ago
The first race is between: https://github.com/ocaml-multicore/domainslib/blob/3a74279e2463eb6d7389c92ebd5cef7807274b72/lib/task.ml#L214 and the initializing write at https://github.com/ocaml-multicore/domainslib/blob/3a74279e2463eb6d7389c92ebd5cef7807274b72/lib/task.ml#L210 so, according to the memory model, not a real race, as there is supposed to be a synchronization between an initializing write and the sharing of a value, according to the PLDI'18 paper. I don't understand yet why TSan didn't see this synchronization, though.
The second race is between two writes in the prefix_s
array, at https://github.com/ocaml-multicore/domainslib/blob/3a74279e2463eb6d7389c92ebd5cef7807274b72/lib/task.ml#L231-L235 and https://github.com/ocaml-multicore/domainslib/blob/3a74279e2463eb6d7389c92ebd5cef7807274b72/lib/task.ml#L238-L242
The thing is that the parallel_for
actually waits for all operations to complete (via await
), so this is probably another false alarm. Good for Domainslib!
The first race is between:
and the initializing write at https://github.com/ocaml-multicore/domainslib/blob/3a74279e2463eb6d7389c92ebd5cef7807274b72/lib/task.ml#L210
so, according to the memory model, not a real race, as there is supposed to be a synchronization between an initializing write and the sharing of a value, according to the PLDI'18 paper. I don't understand yet why TSan didn't see this synchronization, though.
It might be worthwhile to check whether the Array.copy
primitive has the necessary synchronisation. A release fence.
The copying is performed in caml_array_gather
; in the case of small arrays (which is the case in this test), the array is allocated in the minor heap and a plain memcpy is used.
https://github.com/ocaml/ocaml/blob/50b09531f658c1ebe502cb566dee52c3e32d3c61/runtime/array.c#L448-L460
Maybe some code paths do not go through the fence when that array is promoted to the major heap. (Is it always promoted?) I don't know where the code that handles this is in the runtime.
There is in fact no question of promotion, as it was pointed out to me that it is no longer necessary in the stop-the-world minor collection design that was retained. We still need a fence, though, and indeed there isn't one in caml_array_gather
(used by Array.copy
, append
and concat
).
I'm taking the liberty of pasting here an explanation by @kayceesrk from a discussion on Slack:
Generally, there are only two ways an object can be shared with another domain.
- Through the closure that is used in Domain.spawn
- Assigning to a shared reference. This goes through caml_modify which has the release fence.
Since caml_initialize will precede a caml_modify or Domain.spawn, which has the necessary fence, caml_initialize doesn’t need the fence.
Looking closer, the race here is between the initializing write (a plain store) in Array.copy
and a read in caml_modify
. Indeed, before doing a store, caml_modify
does a plain load (argument *fp
):
https://github.com/ocaml/ocaml/blob/357b42accc160c699219575ab8b952be9594e1d9/runtime/memory.c#L151
and that is done before the fence.
From my (not very experienced) point of view this one is technically a real race… maybe @sadiqj, @maranget or @damiendoligez have an opinion about it?
No longer having this race report from the latest versions of ocaml-tsan, I’ll blame it on an insufficient instrumentation of the runtime back then, and close this.
Once again I report a ThreadSanitizer data race warning. I haven't had time to flesh it out yet, but I intend to look at it more closely soon.
How to reproduce
Pin Domainslib to revision
b8de1f71
. Pin qcheck to revisionf09f42d0
. Switch to the ThreadSanitizer-instrumenting compiler and build and execute the following test (from https://github.com/jmid/multicoretests/):You get the following two reports, both in
Task.parallel_scan
:prefix_s
.