Show code example where results do not depend on the number of clusters

bquistorff commented 9 years ago

I think it would be good to show a code example of a routine where each repetition involves randomness (simulate or bs) and where the outcome doesn't change depending on the number of clusters. The existing ones vary by number of clusters as they set the seed per cluster rather than by repetition.

The way I do this is to generate a list of RNG states, one for each repetition. Then make a program, say gen_bs that takes as an argument the filename of the original data and where the data coming in is a list RNG state strings (seeds). It then loops over the RNG states, for each, setting the current RNG state to that, loading the data, drawing the new bs sample, computing the statistic, and saving the result (either in a post file or matrix). At the end you replace the dataset with the results. Then when you parallel : gen_bs using orig_data.dta it doesn't matter how many ways the full RNG state list was split, the results will always be the same.

Maybe in the future we could make this kind of technique optional for the bs or simulate subcommands.

Edit: Stata14 now uses strLs to store the RNG state which is pain to work with (lots of space, can't merge on them, etc.). So I think it's easier to generate a list of longs for each rep and do set seed <long>.

bquistorff commented 8 years ago

I think additionally it would be good to have a help section about how to parallelize an existing program (do basically the above but in a program context). This might help adoption.

gvegayon commented 8 years ago

Here is an example: https://github.com/gvegayon/parallel#parfor (thou it doesn't uses RNG). You can add your example there (sounds nice). I can add a couple of other examples (from the emails that I've cc-ed you), perhaps, I was thinking that we can go further and create a website (could be a wiki) titled "Stata parallel gallery" (or something like that). I actually use a lot this http://gallery.rcpp.org/ and if we get something similar we can promote it in the module manual and online... what to do think?

bquistorff commented 8 years ago

Sounds good. Let's do the wiki because I imagine that's easiest for now.

bquistorff commented 8 years ago

OK, I've created a gallery. Should we move some of the examples from https://github.com/gvegayon/parallel to the gallery to cleanup the main page?

gvegayon / parallel

Show code example where results do not depend on the number of clusters #21