ocaml-multicore / domainslib

Parallel Programming over Domains
ISC License
171 stars 30 forks source link

Possible data races in Task.parallel_scan #74

Closed OlivierNicole closed 1 year ago

OlivierNicole commented 2 years ago

Once again I report a ThreadSanitizer data race warning. I haven't had time to flesh it out yet, but I intend to look at it more closely soon.

How to reproduce

Pin Domainslib to revision b8de1f71. Pin qcheck to revision f09f42d0. Switch to the ThreadSanitizer-instrumenting compiler and build and execute the following test (from https://github.com/jmid/multicoretests/):

open QCheck
open Domainslib

let test_parallel_scan =
  Test.make ~name:"test Task.parallel_scan" ~count:1000
    (pair (int_bound 10) small_nat)
    (fun (num_additional_domains,array_size) ->
       let pool = Task.setup_pool ~num_additional_domains () in
       let a = Task.run pool (fun () -> Task.parallel_scan pool (+) (Array.make array_size 1)) in
       Task.teardown_pool pool;
       a = Array.init array_size (fun i -> i + 1))
;;
QCheck_runner.run_tests_main [ test_parallel_scan ]

You get the following two reports, both in Task.parallel_scan:

OlivierNicole commented 2 years ago

The first race is between: https://github.com/ocaml-multicore/domainslib/blob/3a74279e2463eb6d7389c92ebd5cef7807274b72/lib/task.ml#L214 and the initializing write at https://github.com/ocaml-multicore/domainslib/blob/3a74279e2463eb6d7389c92ebd5cef7807274b72/lib/task.ml#L210 so, according to the memory model, not a real race, as there is supposed to be a synchronization between an initializing write and the sharing of a value, according to the PLDI'18 paper. I don't understand yet why TSan didn't see this synchronization, though.

OlivierNicole commented 2 years ago

The second race is between two writes in the prefix_s array, at https://github.com/ocaml-multicore/domainslib/blob/3a74279e2463eb6d7389c92ebd5cef7807274b72/lib/task.ml#L231-L235 and https://github.com/ocaml-multicore/domainslib/blob/3a74279e2463eb6d7389c92ebd5cef7807274b72/lib/task.ml#L238-L242

The thing is that the parallel_for actually waits for all operations to complete (via await), so this is probably another false alarm. Good for Domainslib!

kayceesrk commented 2 years ago

The first race is between:

https://github.com/ocaml-multicore/domainslib/blob/3a74279e2463eb6d7389c92ebd5cef7807274b72/lib/task.ml#L214

and the initializing write at https://github.com/ocaml-multicore/domainslib/blob/3a74279e2463eb6d7389c92ebd5cef7807274b72/lib/task.ml#L210

so, according to the memory model, not a real race, as there is supposed to be a synchronization between an initializing write and the sharing of a value, according to the PLDI'18 paper. I don't understand yet why TSan didn't see this synchronization, though.

It might be worthwhile to check whether the Array.copy primitive has the necessary synchronisation. A release fence.

OlivierNicole commented 2 years ago

The copying is performed in caml_array_gather; in the case of small arrays (which is the case in this test), the array is allocated in the minor heap and a plain memcpy is used. https://github.com/ocaml/ocaml/blob/50b09531f658c1ebe502cb566dee52c3e32d3c61/runtime/array.c#L448-L460

Maybe some code paths do not go through the fence when that array is promoted to the major heap. (Is it always promoted?) I don't know where the code that handles this is in the runtime.

OlivierNicole commented 2 years ago

There is in fact no question of promotion, as it was pointed out to me that it is no longer necessary in the stop-the-world minor collection design that was retained. We still need a fence, though, and indeed there isn't one in caml_array_gather (used by Array.copy, append and concat).

OlivierNicole commented 2 years ago

I'm taking the liberty of pasting here an explanation by @kayceesrk from a discussion on Slack:

Generally, there are only two ways an object can be shared with another domain.

  1. Through the closure that is used in Domain.spawn
  2. Assigning to a shared reference. This goes through caml_modify which has the release fence.

Since caml_initialize will precede a caml_modify or Domain.spawn, which has the necessary fence, caml_initialize doesn’t need the fence.

Looking closer, the race here is between the initializing write (a plain store) in Array.copy and a read in caml_modify. Indeed, before doing a store, caml_modify does a plain load (argument *fp): https://github.com/ocaml/ocaml/blob/357b42accc160c699219575ab8b952be9594e1d9/runtime/memory.c#L151 and that is done before the fence.

OlivierNicole commented 2 years ago

From my (not very experienced) point of view this one is technically a real race… maybe @sadiqj, @maranget or @damiendoligez have an opinion about it?

OlivierNicole commented 1 year ago

No longer having this race report from the latest versions of ocaml-tsan, I’ll blame it on an insufficient instrumentation of the runtime back then, and close this.