Closed nilsbecker closed 3 years ago
setting chunk_size to 1 further shows the issue, giving the result 50.
maybe the semantics should be such that the initializing value is always body start
where start
is the beginning index of the chunk? right now the signature seems to be more like a fold
operation, but then it's unclear why init
must be of the same type.
something like :
let parallel_for_reduce' ?(chunk_size=0) ~start ~finish ~body pool reduce_fun =
let chunk_size = if chunk_size > 0 then chunk_size
else begin
let n_domains = (Array.length pool.domains) + 1 in
let n_tasks = finish - start + 1 in
if n_domains = 1 then n_tasks
else max 1 (n_tasks / (8 * n_domains))
end
in
let rec work s e =
if e - s < chunk_size then
let rec loop i acc =
if i >= e then acc
else loop (i+1) (reduce_fun acc (body (i+1)))
in
loop s (body s)
else begin
let d = s + (e - s) / 2 in
let p = async pool (fun _ -> work s d) in
let right = work (d+1) e in
let left = await pool p in
reduce_fun left right
end
in
work start finish
gives me expected results
also, it seems that the reduce function is required to be associative, as chunks are reduced and combined in undefined order; indeed using the above parallel_for_reduce'
with the non-associative let arithmetic_mean a b = (a + b) / 2
instead of (+)
again gives results that depend on chunk size. this requirement should be documented clearly.
for parallel_scan
i am assuming that it should behave like a cumulative Array.fold_left
initialized with the first array element. so i tried this:
let ar = Array.init 10 Fun.id
let sequential op =
let open Array in
init (length ar) (fun i ->
fold_left op ar2.(0) (sub ar2 1 i)) ;;
let scanned op = Task.parallel_scan pool op ar2
[sequential arithmetic_mean; scanned arithmetic_mean; sequential (+); scanned (+)]
the result are the same only for (+)
.
it also seems that to be able to do the fold-like operation in parallel, parallel_scan
needs to essentially do all operations twice; the second time for adding the correct offset to chunks; i assume this is the best one could do to parallelize the sequential fold-list but is a performance concern that should be documented?
it seems a parallel_map
over arrays would a simpler, quite natural abstraction?
Hi @nilsbecker, thanks for the bug report! I have submitted a patch #34 attempting to fix the issue in parallel_for_reduce
.
also, it seems that the reduce function is required to be associative, as chunks are reduced and combined in undefined order; indeed using the above parallel_for_reduce' with the non-associative let arithmetic_mean a b = (a + b) / 2 instead of (+) again gives results that depend on chunk size. this requirement should be documented clearly.
That's fair. #34 includes a doc update to mention this is the case for parallel_for_reduce
and parallel_scan
.
it seems a parallel_map over arrays would a simpler, quite natural abstraction?
Indeed, and I believe it would be fairly simple to implement a parallel_map
with the parallel_for
abstraction. In my head, a primitive version would look something like this:
let parallel_map f l pool =
let arr = Array.of_list l in
let res = Array.make (Array.length arr) (f arr.(0)) in
Domainslib.Task.parallel_for ~start:0 ~finish:(Array.length arr - 1)
~body:(fun i -> res.(i) <- f arr.(i)) pool;
Array.to_list res
hi, the fix looks reasonable. one could still consider if it should be stated that parallel_scan
does twice the work that a sequential fold_left
would do.
some thoughts about the proposed parallel_map
:
parallel_map pool f arr
f
evaluationThe bug in parallel_for_reduce
has been fixed by #34, so closing this issue.
Agree with your points about parallel map, an interface like the one below could be more suitable:
val parallel_map : Domainslib.Task.pool -> ('a -> 'b) -> 'a array -> 'b array
If you think addition of parallel map will be useful on domainslib, please feel free to open a new issue for the same.
i might just be dense here:
i get
[10;30]
but i believe it should be[10;20]
. i'm guessing it's in line 79 in task.ml where the globalinit
is used to initialize chunks that start somewhere in the middle? https://github.com/ocaml-multicore/domainslib/blob/0a3b4f3bb64bd2f14c53518c31282ef421b7858c/lib/task.ml#L79domainslib 0.3.0 on ocaml 4.12.0+domains