gvegayon / parallel

PARALLEL: Stata module for parallel computing
https://rawgit.com/gvegayon/parallel/master/ado/parallel.html
MIT License
117 stars 26 forks source link

wrong N after combining: parallel.ado:225: gen _`parallelid'cut #68

Closed BjarteAAGNES closed 6 years ago

BjarteAAGNES commented 6 years ago

. which parallel R:\STATA\ado\site\p\parallel.ado *! version 1.19.0 26jul2017

Expected behavior and actual behavior

parallel.ado:225:

gen _`parallelid'cut = ceil(_n*${PLL_CLUSTERS}/_N) //each of size _N/$PLL_CLUSTERS

can return values > $PLL_CLUSTERS,

thus when $PLL_CLUSTERS numbers of subsets are combined the combined data have less observations than the origional data.

The example below show how what is return for clusters 7, 11, 14 with N=2771050. An alternative is shown below the value returned to _`parallelid'cut

local _N 2771050
local _n = `_N'

qui foreach c of numlist 7 11 14 {

noi di _dup(72)"-"
global PLL_CLUSTERS =  `c'
noi di "parallel.ado:225:" ceil(`_n'*${PLL_CLUSTERS}/`_N')
noi di "alternative     :" min( `c', ceil(`_n'*${PLL_CLUSTERS}/`_N')) 

}

------------------------------------------------------------------------
parallel.ado:225:8
alternative     :7
------------------------------------------------------------------------
parallel.ado:225:12
alternative     :11
------------------------------------------------------------------------
parallel.ado:225:15
alternative     :14
bquistorff commented 6 years ago

Thanks for finding the error and even finding the line! I've just fixed this in master with your suggested edit.