tweag / monad-bayes

A library for probabilistic programming in Haskell.
MIT License
407 stars 62 forks source link

ListT is deprecated #245

Open turion opened 1 year ago

turion commented 1 year ago

Situation

ListT is deprecated since a long time, and has been removed from major packages:

https://github.com/haskell/mtl/issues/88 https://hackage.haskell.org/package/transformers-0.6.0.0/changelog

The reason it is deprecated is because for a noncommutative monad m, ListT m famously is not a monad. m needs to be commutative for it to be a monad, but that restriction is not enforced anywhere, so it was better to drop the whole construction.

Situation for monad-bayes

This also applies to randomness monads like we use here. Some monads like Weighted are commutative, but others really aren't. For example:

mv1 = do
  x <- normal 0 1
  y <- normal 0 1
  return (x, y)
mv2 = do
  y <- normal 0 1
  x <- normal 0 1
  return (x, y)

While mv1 and mv2 define the same probability distribution mathematically, they are not the same value. When we sample from them with the same random seed, they will produce different results. Sampling is not commutative.

Side note: If we are only interested in statistics calculated from the samples in some kind of limit where we sample arbitrarily often, then we might not care about the difference. But if, for example, we want to reproduce some random sample path, the laws are broken.

Problems

Bit rot

The main problem arising from this situation is that monad-bayes slowly bit-rots because we cannot update dependencies. We have to restrict version bounds to mtl < 2.3 && transformers < 0.6, which will eventually keep monad-bayes out of stackage and make it harder for everyone to use it.

Broken laws

The side problem is also that right now we have a few functions that expose the breakage of monad laws: We have functions like runPopulation that expose the list order, so the broken monad laws can be observed.

Performance

I haven't measured this yet, but I'll venture the guess that ListT is not very fast because it uses lists. Iterating over long lists of particles will incur some performance penalty.

What could be done

Bit rot

We could ship our own ListT explicitly and thus remove the restrictive upper bounds on mtl.

Broken laws

The concrete order of the list elements must never be able to be observed. If we can only, e.g. sample a list element, or only retrieve all list elements in a fixed order, then this is fine. For example, these type signatures are ok:

-- Sorts e.g. by probability, then by a
runPopulation :: Ord a => Population m a -> m [(a, Log Double)]
-- The ordering information is lost in this step
fromWeightedList :: Monad m => m [(a, Log Double)] -> Population m a

These expose unlawful behaviour:

-- Does not sort the entries
runPopulation :: Population m a -> m [(a, Log Double)]

As far as I understand, a "set monad transformer" that doesn't expose the order of the elements would already be good. The minimal implementation that would achieve that would be restricting the interface as shown above. A more involved implementation would actually save the population as a Map a (Log Double) internally, which would have the additional advantage that duplicates could be eliminated.

Performance

We might at some point want to investigate whether some kind of Vector can offer better performance. But that is not the main motivation behind this ticket.

reubenharry commented 1 year ago

I recall that (possibly in the paper?), the authors of monad-bayes were aware that ListT is not a valid monad transformer, but not that their semantics does not require it to be. But as you say, there are other effects of it being not a monad transformer.

Is there a reason we can't using one of the existing ListT replacement libraries?

If not, shipping our own seems reasonable to me.

And re. performance, I agree. I briefly tried refactoring with Vector, but the performance benchmarking was too hard for me.

turion commented 1 year ago

Is there a reason we can't using one of the existing ListT replacement libraries?

They generally work differently than what we want here. They allow for an effect after every element of the list, so it's more like a side-effectful stream. In fact, ListT m a = MSF (MaybeT m) () a. And I guess, Sequential is also very similar to these ListTs. So they definitely have their use case, but it's not Population.

If not, shipping our own seems reasonable to me.

I'll also see next week whether I can write a "set monad transformer" that is useful for us.

And re. performance, I agree. I briefly tried refactoring with Vector, but the performance benchmarking was too hard for me.

:+1: I might look into that in the future.

turion commented 1 year ago

With help of the Berlin Haskell Users Group, I figured out that I didn't understand the situation about why ListT isn't lawful completely. The issue is not so much that one mustn't be able to observe the final order of the list elements, but rather that the order of lists on which we later bind matters subtly.

For example, if you have:

newtype ListT m a = ListT (m [a])
-- insert instance implementations here

failure :: Monad m => String -> ListT (Either String) a

failureList :: ListT (Either String) a
failureList = do
  foo <- pure True <|> pure False
  bar <- if foo then pure "late" else failure "early"
  failure bar

This can go along two paths, and on the first path it will fail late, while on the second path it will fail earlier. Now should this fail with "early" or "late"? It depends on whether we first compute bind on foo or on bar.

Bind on foo first

failureList = do
  bar <- pure "late" <|> failure "early"
  failure bar
= do
  bar <- failure "early"
  failure bar
= failure "early"

Bind on bar first

failureList = do
  foo <- pure True <|> pure False
  if foo then failure "late" else failure "early"
= do
  failure "late" <|> failure "early"
= failure "late"

Intuition

One imagines the whole program as a tree, where every node is a list containing the different branches. Also, every node is decorated with an effect. We wish to enumerate all the leaf nodes. Now the question is whether we traverse this tree width first or depth first. Depending on this choice, the order of effects is different.

What that means for monad-bayes

The same works for other noncommutative monads, not just Either. For example, sampling, Density (and thus Traced). These will all break the laws when combined with Population.

How else to fix ListT

One really simple fix that doesn't need a set monad transformer is simply a free list monad transformer.

newtype ListT m a = ListT (FreeT m [] a)

There is a simple interpretation function ListT m a -> m [a], and everything is lawful. The interpreted program will always bind early.

This fixes the bit rot issue and the lawfulness. It doesn't fix the assumed performance, I'd even assume that if anything it worsens them, so we should look at at benchmark before merging such a change.

turion commented 1 year ago

The best free list transformer for this is probably https://hackage.haskell.org/package/free-5.1.3/docs/Control-Monad-Trans-Free-Ap.html#t:FreeT, which makes use of the Applicative instance of lists. Together with ApplicativeDo, this should be more performant in usual use cases because it requires fewer layers of m.

reubenharry commented 1 year ago

There is also the Church transformed FreeT (https://hackage.haskell.org/package/free-5.1.10/docs/Control-Monad-Trans-Free-Church.html) but that is not making use of the Applicative instance.

Is the lawlessness an actual problem here? That is, does the package either claim or require that the distribution representations be lawful monads? (similar question re. the Lazy sampler).

Might be interesting to benchmark performance of FreeT (having globally enabled ApplicativeDo) using the current benchmarks.

What about FreeT over e.g. Vector. Probably not fast...

ohad commented 1 year ago

@reubenharry 's reply is correct: monad-bayes only assumes (and provides) a monad structure, and the monad laws don't hold for the resulting structure.

Shipping our own copy of ListT will avoid the bit-rot, and the issue with laws is a red-herring.

It would be cool to have more performant implementations in the future --- I agree!

turion commented 1 year ago

and the monad laws don't hold for the resulting structure.

Yes, they certainly don't. For reproduceability when debugging a sampler, or also a formal implementation like delayed sampling (https://github.com/tweag/monad-bayes/issues/224), this can matter.

and the issue with laws is a red-herring.

I don't agree. While it's true that the issue isn't as big as with, say, ListT IO, it definitely exists. But I agree that the maintainability aspects are more important.

Is the lawlessness an actual problem here? That is, does the package either claim or require that the distribution representations be lawful monads? (similar question re. the Lazy sampler).

Well, as a Haskell developer, I will always assume that of course anything that has a Monad instance is lawful. A library that sweeps that under the rug without big red warning letters I wouldn't regard as production ready. So I see fixing the laws as a waypoint on the road to industry strength.

Might be interesting to benchmark performance of FreeT (having globally enabled ApplicativeDo) using the current benchmarks.

Yes, I'll work on that next week.

What about FreeT over e.g. Vector. Probably not fast...

It's hard to say yet. Part of a more extensive benchmark would be finding out what the typical usages of Population are. <*> and >>= will probably not play such a great role, instead fmap will be used a lot I guess. In FreeT, fmap is relatively cheap as far as I know. So if fmap is faster for vectors than for lists, we might see a speedup. But maybe I'm misjudging vectors here and they're really only faster for indexed lookup.

If there is much more <*> than >>= then Ap.FreeT is cheap, because it preserves the applicative structure and only adds a free >>=.

ohad commented 1 year ago

Well, as a Haskell developer, I will always assume that of course anything that has a Monad instance is lawful. A library that sweeps that under the rug without big red warning letters I wouldn't regard as production ready. So I see fixing the laws as a waypoint on the road to industry strength.

Maybe the most appropriate thing to do is to put those big red warning letters then. In the paper we were quite explicit about using a monad structure rather than a monad. What concrete thing should we do in the library?

Type-classes don't really allow for re-use, in the sense that if we now need a completely separate type-class hierarchy for monad structure transformers, that will bloat up the code and make it a lot less maintainable and susceptible to bit-rot. And since Haskell doesn't really require, enforce, or use the monad laws for monad interfaces, the issue feels more ideological than pragmatic. In that case, perhaps it's better to hash out this issue on a Haskell mailing list or GitHub repo, rather than here, since it's a much broader issue than monad-bayes.

dataopt commented 1 year ago

Well, as a Haskell developer, I will always assume that of course anything that has a Monad instance is lawful. A library that sweeps that under the rug without big red warning letters I wouldn't regard as production ready. So I see fixing the laws as a waypoint on the road to industry strength.

I also feel that when something is a Monad, it should obey the monad laws. Wouldn't making use of qualified do-notation be a viable alternative than breaking monad laws?

ohad commented 1 year ago

I'm not sure how the qualified do-notation helps here when what we're using is the existing monad-structure transformer ecosystem that postulates a Monad interface.

turion commented 1 year ago

I tried replacing the deprecated ListT by a free monad over the applicative list. In fact it speeds up the benchmarks significantly!

See https://github.com/tweag/monad-bayes/pull/253 for details. So I think it's uncontroversial to replace ListT by the free construction because it's maintained, lawful, and faster.

turion commented 1 year ago

The issue of looking into Vector or other replacements for lists is a separate one.

ohad commented 1 year ago

Is it correct though?

turion commented 1 year ago

Good question, I'm pretty sure that yes.

Old ListT

Let's look at how the deprecated ListT m a worked. It's isomorphic to m [a], and join :: m [(m [a])] -> m [a] is implemented in the straightforward way.

The issue with it is that it's not associative, i.e. if there is a tower m [(m [(m [a])])], it matters if we join the inner layer with the middle layer first (join . fmap join) or the outer layer with the middle layer first (join . join). Why does it matter?

One can think of a "join-tower" m [(m [(.... m [a])])])...] as a computation tree where the leaves are labelled with effectful as and the nodes with effects m. (It's a bit more complicated because the topology of the subtrees can depend on these effects.) Now the issue is that when we join this tree, join . fmap join does depth-first traversal over these effects, and join . join does width-first. If m isn't commutative, this isn't the same. Which choice is actually taken in a program depends on coding style and refactorings. (So to be honest I'm not even sure how one can argue that the old version was correct, because its semantics isn't very well-defined.)

Free monad

PopulationT m a = FreeT m [] a = m (a + [m (a + [m (a + ...)]...])

The free monad is defined (up to an extra m at the leaf nodes) as the sum of all "join-towers", or computation trees of any height. It's associative by definition because it doesn't make a choice of whether to join innermost or outermost. We don't usually observe values from PopulationT m a directly. Instead, one recovers them by applying:

runPopulationT :: PopulationT m a -> m [a]

This joins the tree systematically width-first, like join . join . join . ... . join. So it has the same semantics as the deprecated ListT for one particular, consistent, choice of bracketing. So if the previous choice was correct for every possible bracketing, then it's still correct for this particular bracketing.

Free monad over an applicative

The free monad over an applicative is the same type, the same functor, and the same monad as the ordinary free monad. But it has a different implementation for Applicative (FreeT m []), one that makes use of Applicative []. It seems that it breaks the law <*> = ap though, and it's unclear whether the law is recovered after interpretation. Thanks for pointing me towards looking into correctness more deeply, I wouldn't have noticed this otherwise.

I thought at first that together with ApplicativeDo, this free construction should be faster than the standard one, but the speedup seems to be much smaller than the one we get from switching to a free construction in the first place, so I'm happy to use the standard free construction instead.

ohad commented 1 year ago

Thanks for the detailed explanation. First, can we please correct some imprecise assertions in it?

I'm not even sure how one can argue that the old version was correct, because its semantics isn't very well-defined.

The semantics is well-defined, it is based on this paper that formally defines these data structures, the semantics of the inference algorithm, and proves that the result is an unbiased sampler.

So it has the same semantics as the deprecated ListT for one particular, consistent, choice of bracketing. So if the previous choice was correct for every possible bracketing, then it's still correct for this particular bracketing.

I don't really follow this argument. Can you make it more precise please, for example, along the lines of the paper?

The paper and its follow-up used the monad-structure transformer ListT because the computation it produces makes the correct samples and in the correct order. @adscib (and later the Tweag team) have tested that the resulting inference algorithms produces the expected distributions when run on models, and the paper proves the algorithms, when working on these data structures, are correct.

In general, free monads may do something different, for example, suspend random samples that some of the monad-bayes combinators do in crucial places. This might mean, for example, that using the same combinations of monad-bayes's building blocks to implement algorithms such as smc, smc^2 etc. are not longer correct.

I suggest we first understand better how this proposed change affects the existing inference algorithms, not just in runtime, but in the actual samples they compute, both by testing and using an accompanying formalisation.

turion commented 1 year ago

My point is that runPopulation $ (a >=> b) >=> c = runPopulation $ a >=> (b >=> c) in the new implementation, but not in the old. Furthermore, runPopulation $ a >=> (b >=> c) agrees in the old and the new implementation, so it has correct semantics. I'm not saying that FreeT m [] a represents a population, but I'm saying that if you apply runPopulation, it does, and it has the same semantics as before. I think this is clear to see from how free monads are defined, or am I overlooking something here?

Maybe the confusion is that runPopulation is not a monad morphism, but noone requires it to be.

I suggest we first understand better how this proposed change affects the existing inference algorithms, not just in runtime, but in the actual samples they compute, both by testing and using an accompanying formalisation.

Note that all the existing test suites pass. Do you have suggestions for a more extensive test suite that we should implement first?

ohad commented 1 year ago

Thanks for implementing additional tests! (Something is currently failing in the build, though.)

I'm not saying that FreeT m [] a represents a population, but I'm saying that if you apply runPopulation, it does, and it has the same semantics as before.

When we combine different inference building blocks, we do it before running runPopulation. So if we change the data structure we use for populations, it's not clear that the way we combine these building blocks to implement, say, smc^2, still gives smc^2.

Put differently, you argue that the monad laws used to fail and now they hold. So something has changed. Before the change the algorithm was doing the correct sampling based on quite a subtle semantic argument on the one hand, and careful implementation of various inference algorithms.

We should hesitate to merge this kind of change without a more detailed analysis for how it affects the algorithms.

turion commented 1 year ago

Ah ok, I think I'm getting it now. Your last explanation helped me. Put yet differently, the way the other transformers push around the population can result in all sorts of bracketing of joins and binds, not necessarily coinciding with the canonical one chosen by runPopulation. Even intentionally so. Yes, I was assuming that such a situation would have been a bug in the first place, but in fact this extra expressivity is used in RMSMC (and thus in SMC2) in the intermediate MH steps. This particular place can be fixed easily, but yes, now we need to look whether there are more such places.

reubenharry commented 1 year ago

I was mostly afraid of meddling with the semantics when I made changes to the library, but there are a couple of things that did change, that are perhaps worth noting:

  1. Dominic and I updated the Sampler implementation to allow for other RNGs.
  2. I added Sam Staton's lazy sampler, and also a "measure" monad (i.e. ContT m Double) which can be used for quadrature, to make plots of simple distributions. Neither of these affect existing code, but I can't vouch for the correctness of their interaction with the other inference transformers.
  3. I implemented a simpler alternative to the Church transformed FreeT used for sampling, which just uses StateT. This isn't actually used in the code anywhere, partly because it's slower, and partly because I was scared to make a semantic change.
  4. I wrapped the parameters for SMC and MCMC in SMCConfig and MCMCConfig datatypes. This is a superficial change.
  5. I introduced an implementation of the mcmc chain using the streaming library pipes. Again, this isn't used anywhere by default, but I think it's convenient to think of the chain in this way.

I will say that subtleties like this make me wish the library was in Idris, and that the proofs of correctness were in the types. (I suspect that is something Ohad would like?). Even having worked with monad-bayes for a while, the correctness proofs of the inference algorithms is still beyond me.