Closed cfbalcazar closed 3 years ago
The nodata
option tells parallel that the data should not be exported to the offspring process. That said, if you need the entire dataset for your for loop, you can always have a wrapper program that loads the dataset that you are planning to use, for example (pardoning me the rust using Stata 😜):
program def vpar
args fileloc ...more arguments...
// Loading the full dataset
use `"`fileloc'"', clear
// You could try doing something like this
if ($pll_instance == 1) {
vcimatrix ...more arguments...
}
... save results ...
save ...some path.../$pll_instance.dta
end
Then you can call parallel
tempfile fulldata
save `fulldata', replace
parallel, nodata prog(...): vpar `fulldata' ...more arguments...
You can also take a look at a detailed example: The implementation of the bootstrap here.
After reading the suggestion and the bootstrap implementation, if I understand correctly, a way to proceed is to use the macro $PLL_CHILDREN to split the loop into chunks and use these chunks in different instances (parallel processes), which can be defined by a conditional statement on $pll_instance. Thus for example
if ($pll_instance == 1) {
vcimatrix ...more arguments...
}
is telling parallel to run vcimatrix using the first child. Am I correct?
I noticed that I was right with my last comment; I managed to parallelize the command. Thank you so much for the guidance, Dr. Vega!!!
Just to complement, here are two examples of parallel for loops:
HIH
Expected behavior and actual behavior
I have been working with PARALLEL to compute regressions with dyadic clustering. I have been developing a simple package for this (link). Specifically, I use PARALLEL to run vcimatrix.ado within dyadclust.ado. vcimatrix.ado contains a loop that computes many variance-covariance matrices which are then added. Both files are in the dyadclust folder.
I was expecting PARALLEL to split the loop in order to speed up the computation of these matrices, but it is not splitting the loop but rather the dataset that is used in the loop. I tried adding the
nodata
option in line 58 but it returns an error: 3 child processes encountered errors. Throwing last error. I have added the log file at the end.As per recommendation of Dr. Vega, in an email, I checked the parfor examples. However I wonder if I need to make my loop a forvalues type? And whether the program needs to use args and does not work with programs defined as eclass with a syntax?
Perhaps I'm using PARALLEL in the wrong way. I would appreciate it if someone can point me in the right direction.
Steps to reproduce the problem
System information
Output from log of error when nodata option is added in line 58 :