This patch attempts to fix a bug reported in #33. The initial value in parallel_for_reduce was accounted multiple times depending on the chunk size. Now it is reduced only once after other computations are over.
This patch also includes a doc update to signal users that the order of execution in the parallel functions is non-deterministic, and includes some correctness tests for parallel_for, parallel_for_reduce and parallel_scan.
This patch attempts to fix a bug reported in #33. The initial value in
parallel_for_reduce
was accounted multiple times depending on the chunk size. Now it is reduced only once after other computations are over.This patch also includes a doc update to signal users that the order of execution in the parallel functions is non-deterministic, and includes some correctness tests for
parallel_for
,parallel_for_reduce
andparallel_scan
.