gvegayon / parallel

PARALLEL: Stata module for parallel computing
https://rawgit.com/gvegayon/parallel/master/ado/parallel.html
MIT License
118 stars 26 forks source link

Problem with PARALLEL for running a program with a loop inside #89

Closed cfbalcazar closed 3 years ago

cfbalcazar commented 3 years ago

Expected behavior and actual behavior

I have been working with PARALLEL to compute regressions with dyadic clustering. I have been developing a simple package for this (link). Specifically, I use PARALLEL to run vcimatrix.ado within dyadclust.ado. vcimatrix.ado contains a loop that computes many variance-covariance matrices which are then added. Both files are in the dyadclust folder.

I was expecting PARALLEL to split the loop in order to speed up the computation of these matrices, but it is not splitting the loop but rather the dataset that is used in the loop. I tried adding the nodata option in line 58 but it returns an error: 3 child processes encountered errors. Throwing last error. I have added the log file at the end.

As per recommendation of Dr. Vega, in an email, I checked the parfor examples. However I wonder if I need to make my loop a forvalues type? And whether the program needs to use args and does not work with programs defined as eclass with a syntax?

Perhaps I'm using PARALLEL in the wrong way. I would appreciate it if someone can point me in the right direction.

Steps to reproduce the problem

* Draft of dyadclust package
net install dyadclust, from (https://raw.githubusercontent.com/cfbalcazar/dyadclust/main/dyadclust/) replace force all
* Load data
u "https://raw.githubusercontent.com/cfbalcazar/dyadclust/main/dyadclust/datasim.dta", clear

* First we compute the standard option (computes correct standard errors) 
dyadclust: reg dY dX, ego(dyad1) alter(dyad2)

* Then we add the parallel computing option (computes the wrong standard errors)
dyadclust: reg dY dX, ego(dyad1) alter(dyad2) par

System information

Output from log of error when nodata option is added in line 58 :

-------------------------------------------------------------------------------
> -
beginning of file -C:\Users\fb\AppData\Local\Temp/__pll4qp93nix31_do0001.
> log-
-------------------------------------------------------------------------------
> -

running c:\ado\personal\profile.do ...

. capture {
-------------------------------------------------------------------------------
> -
Parallel computing with Stata
-------------------------------------------------------------------------------
> -
cmd/dofile   : "vcimatrix: reg dY dX , id1(__000000) id2(__000001) weightss(__0
> 00003) parallel"
pll_id       : 4qp93nix31
pll_instance : 1/3
tmpdir       : C:\Users\fb\AppData\Local\Temp/__pll4qp93nix31_tmpdir0001/
date-time    : 08:31:38 24 Mar 2021
seed         : XAA00000000000095580f4a4d255668e879dd04b6c67631e4476af420479296d [cut short because it is too long]
-------------------------------------------------------------------------------
> -

. local result = _rc

. if (c(rc)) {
. cd "C:\Users\fb\Downloads\dyadclust-main\dyadclust/"
. mata: parallel_write_diagnosis(strofreal(c("rc")),"C:\Users\fb\Download
> s\dyadclust-main\dyadclust/__pll4qp93nix31_finito0001","while setting memory"
> )
. clear
. exit
. }

. 
. * Loading Programs *
. capture {

. local result = _rc

. if (c(rc)) {
. cd "C:\Users\fb\Downloads\dyadclust-main\dyadclust/"
. mata: parallel_write_diagnosis(strofreal(c("rc")),"C:\Users\fb\Download
> s\dyadclust-main\dyadclust/__pll4qp93nix31_finito0001","while loading program
> s")
. clear
. exit
. }

. 
. * Checking for break *
. mata: parallel_break()

. 
. * Loading Globals *
. capture {

. if (c(rc)) {
.   cd "C:\Users\fb\Downloads\dyadclust-main\dyadclust/"
.   mata: parallel_write_diagnosis(strofreal(c("rc")),"C:\Users\fb\Downlo
> ads\dyadclust-main\dyadclust/__pll4qp93nix31_finito0001","while loading globa
> ls")
.   clear
.   exit
. }

. 
. * Checking for break *
. mata: parallel_break()

. capture {
. 
. * Checking for break *
. mata: parallel_break()
.     vcimatrix: reg dY dX , id1(__000000) id2(__000001) weightss(__000003) par
> allel 
no variables defined
(error in option id1())
r(111);
.   }

. if (c(rc)) {
.   cd "C:\Users\fb\Downloads\dyadclust-main\dyadclust/"
C:\Users\fb\Downloads\dyadclust-main\dyadclust
.   mata: parallel_write_diagnosis(strofreal(c("rc")),"C:\Users\fb\Downlo
> ads\dyadclust-main\dyadclust/__pll4qp93nix31_finito0001","while running the c
> ommand/dofile")
.   clear
.   exit

end of do-file
-------------------------------------------------------------------------------
> -
 end of file -C:\Users\fb\AppData\Local\Temp/__pll4qp93nix31_do0001.log- 
>  
-------------------------------------------------------------------------------
> -
gvegayon commented 3 years ago

The nodata option tells parallel that the data should not be exported to the offspring process. That said, if you need the entire dataset for your for loop, you can always have a wrapper program that loads the dataset that you are planning to use, for example (pardoning me the rust using Stata 😜):

program def vpar

    args fileloc ...more arguments...

    // Loading the full dataset
    use `"`fileloc'"', clear

    // You could try doing something like this
    if ($pll_instance == 1) {
       vcimatrix ...more arguments...
    }

    ... save results ...
    save ...some path.../$pll_instance.dta

end

Then you can call parallel

tempfile fulldata
save `fulldata', replace
parallel, nodata prog(...): vpar `fulldata' ...more arguments...

You can also take a look at a detailed example: The implementation of the bootstrap here.

cfbalcazar commented 3 years ago

After reading the suggestion and the bootstrap implementation, if I understand correctly, a way to proceed is to use the macro $PLL_CHILDREN to split the loop into chunks and use these chunks in different instances (parallel processes), which can be defined by a conditional statement on $pll_instance. Thus for example

if ($pll_instance == 1) {
       vcimatrix ...more arguments...
 }

is telling parallel to run vcimatrix using the first child. Am I correct?

cfbalcazar commented 3 years ago

I noticed that I was right with my last comment; I managed to parallelize the command. Thank you so much for the guidance, Dr. Vega!!!

gvegayon commented 3 years ago

Just to complement, here are two examples of parallel for loops:

HIH