OnlineStats for Bayesian modeling?

cscherrer commented 5 years ago

Hi Josh,

I've been moving toward MCMC results being in the form of an iterator instead of an array, and encouraging others in this direction as well. This convenience and flexibility in a lot of different ways.

There seems to be some interest in this approach from the Turing team: https://github.com/TuringLang/AdvancedHMC.jl/issues/101#issuecomment-531494672

And Tamas Papp is also trying this out for DynamicHMC: https://github.com/tpapp/DynamicHMC.jl/pull/94

Have you done or seen anything in this direction for OnlineStats?

The general idea is to specify a stopping criterion, say a standard error on the mean estimate of some function of the posterior sample. I think it will also be nice to have a way to deal with intermediate results.

A few things are needed for this approach, most already available:

Mean and variance
Standard error of mean estimate
Effective sample size, for use on its own and also in standard error. Depending on the context, this is computed in terms of autocorrelations or sample weight.
Rank-normalized R-hat

Any thoughts on this?

joshday commented 5 years ago

I haven't done anything MCMC in a while, but I think OnlineStats has all the pieces you need (means, variances, and autocorrelations).

The implementations of Mean and Variance live in OnlineStatsBase, so if you're looking to add minimal dependencies you can go that route. I should probably move AutoCov over there as well.

sethaxen commented 4 years ago

@cscherrer if you're thinking of making a BayesianOnlineStats package I'd be happy to contribute. It'll be a good excuse to spend more time thinking about how to work with streaming samples and to learn OnlineStats.

I think BFMI as well could be supported. It only requires Mean and Variance.

cscherrer commented 4 years ago

Nice! I haven't thought about this much in a few months, but I do think it's important. Currently the best I have is using Transducers: https://github.com/cscherrer/QuasiMonteCarlo.jl

There are really two independent concern here -- QMC and stream combinators -- but this made a nice sandbox for trying out some ideas.

I think my mental model of the current Julia approach was a bit off. Haskell has a nice "stream fusion" approach that lets you apply a sequence of transformations to a stream without a performance penalty. Transducers is a bit like this turned on its head - there, the transformations compose nicely, as long as you don't actually apply them at each step.

joshday / OnlineStats.jl

OnlineStats for Bayesian modeling? #158