gvegayon / parallel

PARALLEL: Stata module for parallel computing
https://rawgit.com/gvegayon/parallel/master/ado/parallel.html
MIT License
117 stars 26 forks source link

for loops #63

Open kkranker opened 6 years ago

kkranker commented 6 years ago

I do not understand how to implement a for loop using the parallel command. I have something like this:

local ellist A B C D E F
foreach el of local ellist {
  cmd, option(`el')
}

... and I want to execute each instance of cmd in parallel rather than in sequence. That is, for each time through the loop, I want to fire up a new instance of Stata to run cmd. If the length of ellist is greater than the number of clusters, I'd want parallel manage the workload so that (1) the number of loops running at one time is equal to the number of clusters and (2) the next loop starts when a cluster becomes available.

Can the parallel command do this kind of thing? How? Does it make a difference if I'm working in Stata versus or Mata? (I'm working in Mata.)

Thanks, Keith

P.S. I see a parallel_for Mata program is in development, but I don't know how to use it.
P.P.S. I considered using pll_id inside the definition of cmd, but the problem is that the number of processes run will equal the number of clusters, not the number of elements in ellist. P.P.P.S. If you can do this with for loops, can you also do it with while and do loops?

gvegayon commented 6 years ago

In the following example we create a function called myfun that copies data from the variable ellist to vname (new variable) by looping through the elements of the variable ellist.

// Setup
clear all
set trace off
set more off

parallel setclusters 4

// Test data. You can specify the elements you want to loop in parallel here:
set obs 6
quietly {
  gen ellist = "A" if _n == 1
  replace ellist = "B" if _n == 2
  replace ellist = "C" if _n == 3
  replace ellist = "D" if _n == 4
  replace ellist = "E" if _n == 5
  replace ellist = "F" if _n == 6
}

// This program copies ellist into vname
program def myloop
    args vname

    // Creating the variable
    gen `vname' = ""

  // Looping through the data
    forval i = 1/`=_N' {
        qui replace `vname' = ellist[`i'] if _n == `i'
    }
end

// Calling the program in serial fashion
myloop ellist2

// Calling the program using parallel, we need to pass the program in prog
parallel, prog(myloop): myloop ellist2_pll

// Do we get the same output?
list 

// Same example but using mata --------------------------------------------------

mata
void myfunction(string scalar vname) {

    // Creating the data
    (void) st_addvar("str10", vname);

    string matrix D, A;
    D = st_sdata(., "ellist");
    A = st_sdata(.,vname);

    numeric scalar i;
    for (i = 1; i <= rows(A); i++)
        A[i] = D[i];

    st_sstore(.,vname, A);
    return;
}
end

// Serial and parallel fashion
m : myfunction("ellist_mata")
parallel, mata: m: myfunction("ellist_mata_pll")

// Do we get the same?
list
## 
## . // Setup
## . clear all
## 
## . set trace off
## 
## . set more off
## 
## . 
## . parallel setclusters 4
## N Clusters: 4
## Stata dir:  /usr/local/stata12/stata
## 
## . 
## . // Test data. You can specify the elements you want to loop in parallel here:
## . set obs 6
## obs was 0, now 6
## 
## . quietly {
## 
## . 
## . // This program copies ellist into vname
## . program def myloop
##   1.         args vname
##   2.         
## .         // Creating the variable
## .         gen `vname' = ""
##   3. 
## .   // Looping through the data
## .         forval i = 1/`=_N' {
##   4.                 qui replace `vname' = ellist[`i'] if _n == `i'
##   5.         }
##   6. end
## 
## . 
## . // Calling the program in serial fashion
## . myloop ellist2
## (6 missing values generated)
## 
## . 
## . // Calling the program using parallel, we need to pass the program in prog
## . parallel, prog(myloop): myloop ellist2_pll
## -------------------------------------------------------------------------------
## > -
## Exporting the following program(s): myloop
## 
## myloop:
##   1.         args vname
##   2.         gen `vname' = ""
##   3.         forval i = 1/`=_N' {
##   4.                 qui replace `vname' = ellist[`i'] if _n == `i'
##   5.         }
## -------------------------------------------------------------------------------
## > -
## -------------------------------------------------------------------------------
## Parallel Computing with Stata
## Clusters   : 4
## pll_id     : 3nc1i8tzl1
## Running at : /home/george/Documents/parallel/playground
## Randtype   : datetime
## 
## Waiting for the clusters to finish...
## cluster 0001 has exited without error...
## cluster 0002 has exited without error...
## cluster 0003 has exited without error...
## cluster 0004 has exited without error...
## -------------------------------------------------------------------------------
## Enter -parallel printlog #- to checkout logfiles.
## -------------------------------------------------------------------------------
## 
## . 
## . // Do we get the same output?
## . list 
## 
##      +-----------------------------+
##      | ellist   ellist2   ellist~l |
##      |-----------------------------|
##   1. |      A         A          A |
##   2. |      B         B          B |
##   3. |      C         C          C |
##   4. |      D         D          D |
##   5. |      E         E          E |
##      |-----------------------------|
##   6. |      F         F          F |
##      +-----------------------------+
## 
## . 
## . 
## . // Same example but using mata ----------------------------------------------
## > ----
## . 
## . mata
## ------------------------------------------------- mata (type end to exit) -----
## : void myfunction(string scalar vname) {
## >         
## >         // Creating the data
## >         (void) st_addvar("str10", vname);
## >         
## >         string matrix D, A;
## >         D = st_sdata(., "ellist");
## >         A = st_sdata(.,vname);
## >         
## >         numeric scalar i;
## >         for (i = 1; i <= rows(A); i++)
## >                 A[i] = D[i];
## >         
## >         st_sstore(.,vname, A);
## >         return;
## > }
## 
## : end
## -------------------------------------------------------------------------------
## 
## . 
## . // Serial and parallel fashion
## . m : myfunction("ellist_mata")
## 
## . parallel, mata: m: myfunction("ellist_mata_pll")
## -------------------------------------------------------------------------------
## Parallel Computing with Stata
## Clusters   : 4
## pll_id     : 3nc1i8tzl3
## Running at : /home/george/Documents/parallel/playground
## Randtype   : datetime
## 
## Waiting for the clusters to finish...
## cluster 0001 has exited without error...
## cluster 0002 has exited without error...
## cluster 0003 has exited without error...
## cluster 0004 has exited without error...
## -------------------------------------------------------------------------------
## Enter -parallel printlog #- to checkout logfiles.
## -------------------------------------------------------------------------------
## 
## . 
## . 
## . // Do we get the same?
## . list
## 
##      +---------------------------------------------------+
##      | ellist   ellist2   el~2_pll   ellist~a   el~a_pll |
##      |---------------------------------------------------|
##   1. |      A         A          A          A          A |
##   2. |      B         B          B          B          B |
##   3. |      C         C          C          C          C |
##   4. |      D         D          D          D          D |
##   5. |      E         E          E          E          E |
##      |---------------------------------------------------|
##   6. |      F         F          F          F          F |
##      +---------------------------------------------------+
kkranker commented 6 years ago

Thank you -- storing the element list and the output inside of variables is a neat trick that I hadn't thought of. I'll play around with this idea and see if I can make it work.

kkranker commented 5 years ago

I had this working a while ago, but I'm finding that something broke. Perhaps there is a bug in the latest version of your code? When I run the code above, I get the error

cluster #### Exited with error -3499- while running the command/dofile (view log)...

The log files have:

 /* Checking for break */
. mata: parallel_break()
.     m: myfunction("ellist_mata_pll")
                 <istmt>:  3499  myfunction() not found
r(3499);
.   }

I appears the mata function (myfunction) is not being passed to child clusters. Any suggestions?

I'm using the latest version of parallel from SSC. I can't figure out how to install directly from GitHub.

. which parallel
C:\Users\kkranker\Documents\Stata\Ado\plus\p\parallel.ado
*! version 1.15.8.19  19agol2015
*! PARALLEL: Stata module for parallel computing
*! by George G. Vega [cre,aut], Brian Quistorff [ctb]
gvegayon commented 5 years ago

Perhaps you updated Stata? As you can see, the SSC version is pretty old. Instructions to install the dev version are here: https://github.com/gvegayon/parallel#development-version-latestmaster try following those and let us know.

kkranker commented 5 years ago

I've never been able to install Stata packages from GitHub. I always get some type of error. Here's what I got today:

. net install parallel, from( https://raw.github.com/gvegayon/parallel/master/) replace sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable

to find valid certification path to requested target https://raw.github.com/gvegayon/parallel/master/ either 1) is not a valid URL, or 2) could not be contacted, or 3) is not a Stata download site (has no stata.toc file). r(5100);

Is it because the stata.toc file looks incomplete? image

gvegayon commented 5 years ago

You should try downloading another version directly as a zip file as explained here: https://github.com/gvegayon/parallel/tree/sj-review#development-version-latestmaster Follow those instructions and let us know how it goes.

bquistorff commented 5 years ago

I just reconfirmed that the net install from GitHub it worked for me on Stata 14 on Windows (and I've done it previously on Linux). Might be something with Stata v15, or something with your local setup. Hard to tell.

kkranker commented 5 years ago

Okay, thanks. I'll write Stata tech support and see if they have an idea.

I just put in a pull request with edits to the Stata.toc file that I thought might help. But I can get the installation to work on my fork either, so that might not be the issue.