Closed bquistorff closed 8 years ago
I think additionally it would be good to have a help section about how to parallelize
an existing program (do basically the above but in a program context). This might help adoption.
Here is an example: https://github.com/gvegayon/parallel#parfor (thou it doesn't uses RNG). You can add your example there (sounds nice). I can add a couple of other examples (from the emails that I've cc-ed you), perhaps, I was thinking that we can go further and create a website (could be a wiki) titled "Stata parallel gallery" (or something like that). I actually use a lot this http://gallery.rcpp.org/ and if we get something similar we can promote it in the module manual and online... what to do think?
Sounds good. Let's do the wiki because I imagine that's easiest for now.
OK, I've created a gallery. Should we move some of the examples from https://github.com/gvegayon/parallel to the gallery to cleanup the main page?
I think it would be good to show a code example of a routine where each repetition involves randomness (simulate or bs) and where the outcome doesn't change depending on the number of clusters. The existing ones vary by number of clusters as they set the seed per cluster rather than by repetition.
The way I do this is to generate a list of RNG states, one for each repetition. Then make a program, say gen_bs that takes as an argument the filename of the original data and where the data coming in is a list RNG state strings (seeds). It then loops over the RNG states, for each, setting the current RNG state to that, loading the data, drawing the new bs sample, computing the statistic, and saving the result (either in a post file or matrix). At the end you replace the dataset with the results. Then when you
parallel : gen_bs using orig_data.dta
it doesn't matter how many ways the full RNG state list was split, the results will always be the same.Maybe in the future we could make this kind of technique optional for the bs or simulate subcommands.
Edit: Stata14 now uses
strL
s to store the RNG state which is pain to work with (lots of space, can'tmerge
on them, etc.). So I think it's easier to generate a list oflong
s for each rep and doset seed <long>
.