However, one may still be able to get reproducible results if clusterSetRNGStream is used with foreach. Though, I have not used the foreach package much and it requires that the loop iterator is split equally to each thread. An alternative is to replace the foreach with parSapply which I know will work.
This https://github.com/marjoleinF/pre/blob/e156fbe9969df778781081fc69c4da23bf368dab/R/pre.R#L949-L963
is bad idea as it scales very poorly. E.g., suppose there are 200000 rows, we want to sub-sample to half the size and we want 1000 trees. Then it requires 10^5 10^3 4 = 400 mega bytes of ram. I do see that the reason is the
foreach
call later https://github.com/marjoleinF/pre/blob/e156fbe9969df778781081fc69c4da23bf368dab/R/pre.R#L998However, one may still be able to get reproducible results if
clusterSetRNGStream
is used withforeach
. Though, I have not used theforeach
package much and it requires that the loop iterator is split equally to each thread. An alternative is to replace theforeach
withparSapply
which I know will work.