gvegayon / parallel

PARALLEL: Stata module for parallel computing

MIT License

118 stars 26 forks source link

I am looking to run a few distinct commands in parallel on all observations in a single data set but do not know how to disable parallel's splitting of the data saet.

Expected behavior and actual behavior

clear
set obs 1000000
parallel setclusters 3

forvalues v = 1(1)3 {
gen x`v' = rnormal()
gen y`v' = x`v' + rnormal()
}

define testprog
forvalues v = 1(1)3 {
reg y`v' x`v'
di e(N)
predict fv`v'
}
end

parallel, prog(testprog): testprog

What I want is to run each command reg y1 x1, reg y2 x2, etc. on the full data set of 1,000,000 in parallel and then have them create a fitted value variable. However, parallel instead runs three tasks and runs each regression with 333,333 observations.

Any easy workarounds?

Thanks for your time. Love the code/idea.

System information

Some relevant information

Stata version and flavor (e.g. v14 MP): 14.2MP
OS: RHEL
Parallel version: 1.19.0

gvegayon / parallel

Run multiple commands in parallel without splitting data #67

Expected behavior and actual behavior

System information

Output from `creturn list`:

gvegayon / parallel

Run multiple commands in parallel without splitting data #67

Expected behavior and actual behavior

System information

Output from creturn list:

Output from `creturn list`: