ocaml-multicore / domainslib

Parallel Programming over Domains
ISC License
171 stars 30 forks source link

Non-termination of some Task.async/await runs #57

Closed jmid closed 2 years ago

jmid commented 2 years ago

I'm experiencing issues where simple Task.async/await usage in some runs cause non-termination.

dune:

(executable
 (name task_issue_414)
 (modes native byte)
 (modules task_issue_414)
 (libraries domainslib))

task_issue_414.ml:

open Domainslib

(* a simple work item, from ocaml/testsuite/tests/misc/takc.ml *)
let rec tak x y z =
  if x > y then tak (tak (x-1) y z) (tak (y-1) z x) (tak (z-1) x y)
           else z

let work () =
  for _ = 1 to 200 do
    assert (7 = tak 18 12 6);
  done

;;
for i=0 to 10 do
  let () = Printf.printf "%i %!" i in
  let pool = Task.setup_pool ~num_additional_domains:2 () in
  let p0 = Task.async pool work in
  let p1 = Task.async pool (fun () -> work (); Task.await pool p0) in
  let p2 = Task.async pool (fun () -> work (); Task.await pool p1) in
  let p3 = Task.async pool (fun () -> work (); Task.await pool p0) in
  let () = List.iter (fun p -> Task.await pool p) [p0;p1;p2;p3] in
  let () = Task.teardown_pool pool in ()
done

When I run it, the first 1-5 iterations go well (numbers are printed) until some iteration where 2-CPUs out of 4 on my laptop take turns to spike to 100% - without the program going further. The CPU activity indicates that the underlying tasks are trying (but failing) to make progress (a livelock?). I still haven't experienced the program completing all 10 outer iterations.

I observe this on both

and with domainslib 0.3.2 installed through opam.

This is on an old Intel dual-core x86_64 Thinkpad (w/4 CPU threads) running Linux 5.4.0-91-generic.

jmid commented 2 years ago

Sorry, my bad. These tests had not been updated to use Task.run as is now required 🤦‍♂️