gvegayon / parallel

PARALLEL: Stata module for parallel computing
https://rawgit.com/gvegayon/parallel/master/ado/parallel.html
MIT License
117 stars 26 forks source link

Error when number of tasks is less than the number of clusters #44

Closed bquistorff closed 8 years ago

bquistorff commented 8 years ago

In the default setup (where the dataset is divided amongst the clusters) there is an error when there are fewer observations than clusters. Ideally, we'd temporarily lower the number of clusters used. Other ways of using parallel might suffer similar problems, I haven't checked.

gvegayon commented 8 years ago

That makes sense. Although it should be with a warning. I do see some reasons why using parallel with less obs than clusters could be useful. The change should be on the parallel function directly on the following lines:

https://github.com/gvegayon/parallel/blob/master/ado/parallel.ado#L284 https://github.com/gvegayon/parallel/blob/master/ado/parallel.ado#L313 https://github.com/gvegayon/parallel/blob/master/ado/parallel.ado#L340 https://github.com/gvegayon/parallel/blob/master/ado/parallel.ado#L358

Further, if it is been used with by() it might be more complicated. What if N > $PLL_CLUSTERS but after using by(var) it turns out that N[by(var)] < $PLL_CLUSTERS. Seems complicated

bquistorff commented 8 years ago

Done in 86fc862.