gvegayon / parallel

PARALLEL: Stata module for parallel computing
https://rawgit.com/gvegayon/parallel/master/ado/parallel.html
MIT License
118 stars 26 forks source link

When is data returned to the calling workspace? #61

Closed feenberg closed 5 years ago

feenberg commented 6 years ago

In my application there won't be any data in core when -parallel do- is called, but there will be data created in the do file to be appended and returned. I am confused about how -parallel do- returns data in the workspace. I have a do file -makex.do-:

set obs 1000
gen x=$pll_instance

If I run:

parallel setclusters=2,f
parallel do makex.do

It runs without error but returns nothing to the calling workspace (where I expected to see 2,000 rows of x). If I add

save $pll_instance

to makex.do, then 1.dta and 2.dta are saved to disk and can be retrieved., but I thought that would be automatic unless -nodata- was specified.

We have very large files in shards, and my eventual goal is to process them with -parallel-. -Parallel- will allow us to parallelize input/output, which StataMP doesn't really do. I am trying to put together instructions for other users, and want them to be very clear.

Perhaps -parallel append- is more appropriate for this application, but I haven't figured it out at all.

versions: parallel: vers 1.15.8.19 (19agol2015) Linux : 2.6.32-696.18.7.el6.x86_64 Stata: 14.2

gvegayon commented 6 years ago

Have you tried with the development version?

https://github.com/gvegayon/parallel#development-version-latestmaster

I don't have stata right now, so I can't check, but I suppose you should get the data back.