stan-dev / stan

Stan development repository. The master branch contains the current release. The develop branch contains the latest stable development. See the Developer Process Wiki for details.
https://mc-stan.org
BSD 3-Clause "New" or "Revised" License
2.6k stars 370 forks source link

threaded gq facility #2911

Closed wds15 closed 8 months ago

wds15 commented 4 years ago

Summary:

When we run the generated quantities facility we should allow for a threaded evaluation to speed things up.

Description:

As the draws from the posterior are independent it should be easy to parallelise the generated quantities facility to use threads. The only difficulty is keeping the order consistent which must not change despite the fact of generating simulations out of order.

Reproducible Steps:

NA

Current Output:

Things run serially only, but that should be possible to change

Expected Output:

Same output as we have now, just faster when more threads are being used.

Additional Information:

NA

Current Version:

v2.22.0

bob-carpenter commented 4 years ago

The generated quantities can be generated asynchronously given the parameters. The only reason to synch them is for low-level reproducibility, which is certainly nice to have. If we need that low-level reproducibility, then we need to know how many base RNGs are necessary and we need to be able to reset state.

wds15 commented 4 years ago

Yack... random numbers are a consideration to sort out here.

My preferred solution would be to have always the same results; no matter what.

A variation of that could be always have the same output when the number of threads is the same. That would allow to use one rng per thread. Anything else is probably difficult as we cannot make any assumptions like there is always a fixed amount of random numbers per draw.

wds15 commented 4 years ago

@bob-carpenter To solve the random number generator problem: Right now we use 1 random number generator for all draws. That can't be threaded at all... but how about we do always use - say - 128 random number generators (even if using just one thread). Then we can split those 128 random number generators over the number of actual threads being used. All what would be left is to make a non-changing assignment of iteration number to random number generator.

Would that be an option?

WardBrian commented 8 months ago

https://github.com/stan-dev/stan/pull/3212 Added this for running over multiple chains, which seems like the best way to parallelize while avoiding the rng issues above