rdicosmo / parmap

Parmap is a minimalistic library allowing to exploit multicore architecture for OCaml programs with minimal modifications.
http://rdicosmo.github.io/parmap/
Other
92 stars 20 forks source link

Support a parallel fold variant for the order preservation of final data processing results #113

Open elfring opened 9 months ago

elfring commented 9 months ago

The following information is provided in the section “Preservation of output order in Parmap”.

rdicosmo commented 9 months ago

Hi @elfring, I fear reading this issue that the documentation you are pointing at is confusing, so it is that documentation that should be changed, and not the code.

Let me try to explain and check if you agree: the original implementation of the map functions did not preserve the order, to maximise efficiency, so when using the chunksize parameter something like parmap (fun x -> x+1) [1;2;3;4;5] may return something like [6;2;3;4;5] instead of the expected [2;3;4;5;6]. This is why there is now a keeporder parameter that forces the result of the map function to be in the same order as the input (at some extra price in terms of execution time) even when chunksize is used.

Fold operations are different: the output of the function is the result of aggregating the input values into a single element, so, "preserving the input order" does not mean the same as for a map function. The only "order preservation" would be the order in which one accumulates the elements of the input during the computation. Now, when one implements something like List.fold_right (+) l 0 in parallel, the order in which the different elements are "accumulated" to produce the final result is necessarily not sequential (otherwise there is nothing you can gain going in parallel), so it is well known that something like a parfold only make sense if the folding operator (here, the +) is commutative and associative. This is what the documentation tries to say.

Unfortunately, it seems that as it is written, it may give the impression that one gives up parallelism. This is not the case: the implementation of the fold operations is indeed parallel and uses load balancing, as it is implemented via the general mapper function (see https://github.com/rdicosmo/parmap/blob/master/src/parmap.ml#L525), so performance is not an issue here, you already get the best ;-)

Maybe the best way forward is just to remove from the documentation the paragraph "No reordering logic is implemented for parmapfold, parfold and their variants, as performing these operations in parallel only make sense if the order is irrelevant." altogether?

elfring commented 9 months ago

Fold operations are different: the output of the function is the result of aggregating the input values into a single element, …

:thought_balloon: It seems that we stumble on general communication difficulties. :crystal_ball: Can the “preservation of an output order” be occasionally also desirable for parallel fold operations? Which algorithm would be chosen for the combination of final data processing results? :thinking: