mlr-org / parallelMap

R package to interface some popular parallelization backends with a unified interface
https://parallelmap.mlr-org.com
Other
57 stars 14 forks source link

Reproducibility of `mlr` quickstart example fails using parallelization #57

Closed GegznaV closed 6 years ago

GegznaV commented 6 years ago

I took the quickstart example from mlr cheatsheet and tried to run it with parallelization on Windows 10. Reproducibility failed. The example and details are in this GitHub issue. My question is: why did reproducibility fail? And how should I adjust the code to make it reproducible in parallel on Windows? Is the seeding improper? I tried both set.seed(123456, "L'Ecuyer-CMRG") and parallel::clusterSetRNGStream(iseed = 123456). I have no problems to reproduce simple examples like parallelMap(runif, rep(3, 2)) in parallel, but the quickstart example fails.

I did not find any comprehensive resource on how to properly use mlr with parallelization to get reproducible results. E.g., #43 is still open.

(I'm not sure if I had to write a question in this or in mlr repository. I chose this one.)

jakob-r commented 6 years ago

I will close this as your stack overflow question should be answered. But I will give #43 a higher priority now.

GegznaV commented 6 years ago

@jakob-r, would you consider writing a short chapter in mlr tutorial about parallel processing that for reproducible results both set.seed() and parallel::clusterSetRNGStream() should be used? You can include your answer on StackOverflow as an example.

jakob-r commented 6 years ago

I would prefer to set parallel::clusterSetRNGStream() directly in parallelMap I guess. otherweise it should be documented. That's right.