hyunjimoon / SBC

https://hyunjimoon.github.io/SBC
Other
47 stars 3 forks source link

SBC and within-chain parallelisation #103

Open mhollanders opened 6 days ago

mhollanders commented 6 days ago

Hi,

Sorry this is naive question as I'm not familiar with the future package.

The model I'm using to specify the backend has within-chain parallelisation enabled. When creating the backend, I therefore specify the num_threads option in SBC_backend_cmdstan_sample() and grainsize is specified in SBC_generator_function(). Is this going to play with how SBC using the future package? Ideally I'd just be fitting the datasets generated with generate_datasets() sequentially and let Stan handle the threads.

Thanks,

Matt

martinmodrak commented 6 days ago

Good question. The short answer is that if you never call future::plan() or after an explicit call of future::plan(future::sequential), SBC will process the datasets sequentially.

However, if you are limited by CPU (which is the case for most Stan models), you are likely to get noticeably faster computation when paralellising across fits/chains than paralellizing within chains (there's just much less overhead). The exception is when the parallel fits/chains would not fit in RAM at the same time - then running fewer fits/chains in parallel and paralellizing within chain becomes useful.

It should be possible to have a hybrid setting, by calling future::plan(future::multisession, workers = max_parallel_chains) - this will limit the number of parallel fits and you should still be able to paralellize within chains (I didn't test this, but it should work). There is also the cores_per_fit argument to compute_SBC that will (for Stan) control the number of chains ran in parallel for the same dataset.

Does that make sense?

mhollanders commented 2 days ago

Hey Martin, thanks for getting back so quickly. I think it makes sense; just to make sure, is workers = max_parallel_chains saying that at most max_parallel_chains fits should be run in parallel, which each of those fits potentially utilising threading?

The main reason it's relevant for me is that I want the standalone, "production" Stan program to include within-chain parallelisation because it'll always be used like that with real data, and this is the program I'd like to use SBC with.