Open kkranker opened 6 years ago
In the following example we create a function called myfun
that copies data from the variable ellist
to vname
(new variable) by looping through the elements of the variable ellist
.
// Setup
clear all
set trace off
set more off
parallel setclusters 4
// Test data. You can specify the elements you want to loop in parallel here:
set obs 6
quietly {
gen ellist = "A" if _n == 1
replace ellist = "B" if _n == 2
replace ellist = "C" if _n == 3
replace ellist = "D" if _n == 4
replace ellist = "E" if _n == 5
replace ellist = "F" if _n == 6
}
// This program copies ellist into vname
program def myloop
args vname
// Creating the variable
gen `vname' = ""
// Looping through the data
forval i = 1/`=_N' {
qui replace `vname' = ellist[`i'] if _n == `i'
}
end
// Calling the program in serial fashion
myloop ellist2
// Calling the program using parallel, we need to pass the program in prog
parallel, prog(myloop): myloop ellist2_pll
// Do we get the same output?
list
// Same example but using mata --------------------------------------------------
mata
void myfunction(string scalar vname) {
// Creating the data
(void) st_addvar("str10", vname);
string matrix D, A;
D = st_sdata(., "ellist");
A = st_sdata(.,vname);
numeric scalar i;
for (i = 1; i <= rows(A); i++)
A[i] = D[i];
st_sstore(.,vname, A);
return;
}
end
// Serial and parallel fashion
m : myfunction("ellist_mata")
parallel, mata: m: myfunction("ellist_mata_pll")
// Do we get the same?
list
##
## . // Setup
## . clear all
##
## . set trace off
##
## . set more off
##
## .
## . parallel setclusters 4
## N Clusters: 4
## Stata dir: /usr/local/stata12/stata
##
## .
## . // Test data. You can specify the elements you want to loop in parallel here:
## . set obs 6
## obs was 0, now 6
##
## . quietly {
##
## .
## . // This program copies ellist into vname
## . program def myloop
## 1. args vname
## 2.
## . // Creating the variable
## . gen `vname' = ""
## 3.
## . // Looping through the data
## . forval i = 1/`=_N' {
## 4. qui replace `vname' = ellist[`i'] if _n == `i'
## 5. }
## 6. end
##
## .
## . // Calling the program in serial fashion
## . myloop ellist2
## (6 missing values generated)
##
## .
## . // Calling the program using parallel, we need to pass the program in prog
## . parallel, prog(myloop): myloop ellist2_pll
## -------------------------------------------------------------------------------
## > -
## Exporting the following program(s): myloop
##
## myloop:
## 1. args vname
## 2. gen `vname' = ""
## 3. forval i = 1/`=_N' {
## 4. qui replace `vname' = ellist[`i'] if _n == `i'
## 5. }
## -------------------------------------------------------------------------------
## > -
## -------------------------------------------------------------------------------
## Parallel Computing with Stata
## Clusters : 4
## pll_id : 3nc1i8tzl1
## Running at : /home/george/Documents/parallel/playground
## Randtype : datetime
##
## Waiting for the clusters to finish...
## cluster 0001 has exited without error...
## cluster 0002 has exited without error...
## cluster 0003 has exited without error...
## cluster 0004 has exited without error...
## -------------------------------------------------------------------------------
## Enter -parallel printlog #- to checkout logfiles.
## -------------------------------------------------------------------------------
##
## .
## . // Do we get the same output?
## . list
##
## +-----------------------------+
## | ellist ellist2 ellist~l |
## |-----------------------------|
## 1. | A A A |
## 2. | B B B |
## 3. | C C C |
## 4. | D D D |
## 5. | E E E |
## |-----------------------------|
## 6. | F F F |
## +-----------------------------+
##
## .
## .
## . // Same example but using mata ----------------------------------------------
## > ----
## .
## . mata
## ------------------------------------------------- mata (type end to exit) -----
## : void myfunction(string scalar vname) {
## >
## > // Creating the data
## > (void) st_addvar("str10", vname);
## >
## > string matrix D, A;
## > D = st_sdata(., "ellist");
## > A = st_sdata(.,vname);
## >
## > numeric scalar i;
## > for (i = 1; i <= rows(A); i++)
## > A[i] = D[i];
## >
## > st_sstore(.,vname, A);
## > return;
## > }
##
## : end
## -------------------------------------------------------------------------------
##
## .
## . // Serial and parallel fashion
## . m : myfunction("ellist_mata")
##
## . parallel, mata: m: myfunction("ellist_mata_pll")
## -------------------------------------------------------------------------------
## Parallel Computing with Stata
## Clusters : 4
## pll_id : 3nc1i8tzl3
## Running at : /home/george/Documents/parallel/playground
## Randtype : datetime
##
## Waiting for the clusters to finish...
## cluster 0001 has exited without error...
## cluster 0002 has exited without error...
## cluster 0003 has exited without error...
## cluster 0004 has exited without error...
## -------------------------------------------------------------------------------
## Enter -parallel printlog #- to checkout logfiles.
## -------------------------------------------------------------------------------
##
## .
## .
## . // Do we get the same?
## . list
##
## +---------------------------------------------------+
## | ellist ellist2 el~2_pll ellist~a el~a_pll |
## |---------------------------------------------------|
## 1. | A A A A A |
## 2. | B B B B B |
## 3. | C C C C C |
## 4. | D D D D D |
## 5. | E E E E E |
## |---------------------------------------------------|
## 6. | F F F F F |
## +---------------------------------------------------+
Thank you -- storing the element list and the output inside of variables is a neat trick that I hadn't thought of. I'll play around with this idea and see if I can make it work.
I had this working a while ago, but I'm finding that something broke. Perhaps there is a bug in the latest version of your code? When I run the code above, I get the error
cluster #### Exited with error -3499- while running the command/dofile (view log)...
The log files have:
/* Checking for break */
. mata: parallel_break()
. m: myfunction("ellist_mata_pll")
<istmt>: 3499 myfunction() not found
r(3499);
. }
I appears the mata function (myfunction) is not being passed to child clusters. Any suggestions?
I'm using the latest version of parallel from SSC. I can't figure out how to install directly from GitHub.
. which parallel
C:\Users\kkranker\Documents\Stata\Ado\plus\p\parallel.ado
*! version 1.15.8.19 19agol2015
*! PARALLEL: Stata module for parallel computing
*! by George G. Vega [cre,aut], Brian Quistorff [ctb]
Perhaps you updated Stata? As you can see, the SSC version is pretty old. Instructions to install the dev version are here: https://github.com/gvegayon/parallel#development-version-latestmaster try following those and let us know.
I've never been able to install Stata packages from GitHub. I always get some type of error. Here's what I got today:
. net install parallel, from( https://raw.github.com/gvegayon/parallel/master/) replace sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable
to find valid certification path to requested target https://raw.github.com/gvegayon/parallel/master/ either 1) is not a valid URL, or 2) could not be contacted, or 3) is not a Stata download site (has no stata.toc file). r(5100);
Is it because the stata.toc file looks incomplete?
You should try downloading another version directly as a zip file as explained here: https://github.com/gvegayon/parallel/tree/sj-review#development-version-latestmaster Follow those instructions and let us know how it goes.
I just reconfirmed that the net install
from GitHub it worked for me on Stata 14 on Windows (and I've done it previously on Linux). Might be something with Stata v15, or something with your local setup. Hard to tell.
Okay, thanks. I'll write Stata tech support and see if they have an idea.
I just put in a pull request with edits to the Stata.toc file that I thought might help. But I can get the installation to work on my fork either, so that might not be the issue.
I do not understand how to implement a
for
loop using theparallel
command. I have something like this:... and I want to execute each instance of
cmd
in parallel rather than in sequence. That is, for each time through the loop, I want to fire up a new instance of Stata to runcmd
. If the length ofellist
is greater than the number of clusters, I'd wantparallel
manage the workload so that (1) the number of loops running at one time is equal to the number of clusters and (2) the next loop starts when a cluster becomes available.Can the
parallel
command do this kind of thing? How? Does it make a difference if I'm working in Stata versus or Mata? (I'm working in Mata.)Thanks, Keith
P.S. I see a
parallel_for
Mata program is in development, but I don't know how to use it.P.P.S. I considered using
pll_id
inside the definition ofcmd
, but the problem is that the number of processes run will equal the number of clusters, not the number of elements inellist
. P.P.P.S. If you can do this withfor
loops, can you also do it withwhile
anddo
loops?