gvegayon / parallel

PARALLEL: Stata module for parallel computing
https://rawgit.com/gvegayon/parallel/master/ado/parallel.html
MIT License
117 stars 26 forks source link

Incorrect documentation on parallelizing for loops #107

Open karthiktadepalli1 opened 11 months ago

karthiktadepalli1 commented 11 months ago

In the manual for parallel, how to parallelize a loop is covered in Section 3.2, where the following example code is given:

local n_proc = <number set by user>
save currdata.dta, replace
drop _all
set obs `num_total´
generate long i = _n
if `n_proc´>1 {
parallel initialize `n_proc´
parallel: parfor_task
}
else {
parfor_task
}

program parfor_task
local num_task = _N
mkmat i, matrix(tasks_i)
use currdata.dta, clear
forvalues j=1/`=_N´ {
local i = tasks_i[`j´,1]
// work for i
}
// put output into main data
end

I believe this code is wrong. In parfor_task, we loop from 1 to =_N which is supposed to be equal to the number of tasks we are looping over. However, because currdata is loaded, =_N actually gives the number of rows in currdata, which is not at all the same thing. I believe this is a typo, and it should instead read

forvalues j=1/`num_task' {

I suspect that's why num_task was defined - it is not used in the example code otherwise. I spent a few days in headache because of this, so hopefully this change helps someone :)

bquistorff commented 11 months ago

I think you're right. Thanks for spotting. Not much I think we can do for the article. We don't currently have that example in our documentation.