tweag / monad-bayes

A library for probabilistic programming in Haskell.
MIT License
407 stars 62 forks source link

Incorporating Dunai-Bayes #213

Open reubenharry opened 1 year ago

reubenharry commented 1 year ago

I'm wondering if, since dunai-bayes is so small, it would work to have it as part of monad-bayes. (Certainly rhine-bayes should remain separate though).

Cons:

Pros:

If I did this, I'd make Inference/SMC/Online.hs containing the new smc algorithm, and add some type like StochasticProcess to Class.hs.

turion commented 1 year ago

I'd argue against this on a package level. The dependency argument is a very heavy one. But also architecturally, I believe it's not ideal to pin down dunai as the reference implementation of monadic stream functions. For example there is https://github.com/psg-mit/probzelus-haskell, https://github.com/turion/essence-of-live-coding, https://hackage.haskell.org/package/netwire, pipes, and many inference algorithms will be adaptable to all or some of these libraries. Maybe with some work we will be able to find a type class that captures inference for stochastic process. Maybe ArrowInfer!

It might make sense to put dunai-bayes in the same repository, though, to speed up concurrent development. Then, a separate package pipes-bayes etc. can live in the same repository. Those can be joined up in a cabal.project, so development does not diverge, but one can chose the dependencies one needs.

Breaking down my opinion on the different points you brought:

Cons:

new dependency: dunai

That's in general bad. If dunai breaks, has outdated dependencies, ..., monad-bayes has a problem. If dunai-bayes is a separate package, only that package has a problem.

not necessary to stick all things together in one library

True.

Pros:

conceptual coherence: all Bayesian inference algorithms in one library, monad-bayes then able to express stochastic processes

True, can't really argue against that. Except maybe that the amount of algorithms might be overwhelming at some point.

documentation for dunai-bayes can then be written together with monad-bayes' docs

The reference docs should probably stay with the code on hackage, right? If you're referring to tutorials on e.g. monad-bayes-site.netlify.app/, yes, that's a good point, but can still be achieved by only having the same repo.

makes dunai-bayes more discoverable

I guess the same argument goes as before, if only both are in the same repo, it can be discovered via the tutorials. It cannot be automatically discovered via Hackage e.g. in the list of all monads that implement MonadInfer, that's true (except we explicitly link to it), but it can still be discovered, for example, via hoogle.

since dunai-bayes is so small

This can well change in the future :)

reubenharry commented 1 year ago

Thanks for this reply! That all seems convincing to me against merging the packages. My main incentive for merging the repos is that I'd then feel slightly happier using the current docs template in monad-bayes to add docs for dunai-bayes (since it would be easier to remember to update the docs when changing dunai-bayes), but then again, there's also rhine-bayes to consider.

Am I right in thinking that all the libraries you mentioned have the same fundamental type of MStream m a b = a -> m (a, MStream m a b), (up to naming conventions)? EDIT: ah, I see netwire is a bit different. It looks interesting...

turion commented 1 year ago

My main incentive for merging the repos is that I'd then feel slightly happier using the current docs template in monad-bayes to add docs for dunai-bayes (since it would be easier to remember to update the docs when changing dunai-bayes), but then again, there's also rhine-bayes to consider.

Let's merge the repos monad-bayes and dunai-bayes then, and think about rhine-bayes later. I think rhine brings too many new concepts to be merged into the same documentation. I think it's a good idea to explain dunai-bayes as "monad-bayes with dunai added", and rhine-bayes as "rhine with dunai-bayes added". But we'll see in the future when we have more material :)

Am I right in thinking that all the libraries you mentioned have the same fundamental type of MStream m a b = a -> m (a, MStream m a b), (up to naming conventions)? EDIT: ah, I see netwire is a bit different. It looks interesting...

This MStream (which is isomorphic to dunai's MSF and also to the thing defined in haskell-probzelus) is the final encoding of "side-effectful Mealy machines" which had many different names throughout the decades. The initial encoding would be something like:

data Mealy m a b = forall s . Mealy { runMealy :: a -> s -> m (b, s) }
data Mealy' m a b = forall s . Mealy' { runMealy' :: a -> StateT s m b }

(The second variant is isomorphic to the first one.)

The implementation in netwire and essence-of-live-coding follow the initial encoding with some optimizations and restrictions. But they can both be embedded in MSF. Thus, they all have a function like this:

-- msf can be dunai's MSF, eolc's Cell, probzelus-haskell's ZStream
step :: msf m a b -> a -> m (b, msf a b)

EDIT: netwire's Wire is a bit different because it doesn't hide the state variable. I now believe that this is a bit broken and shouldn't be considered in our discussion.

I'm not sure yet what the complete API should be that they all share that is sufficient for all inference algorithms. But I'm sure we can find that out by reimplementing inference for several libraries and comparing them.