tweag / monad-bayes

A library for probabilistic programming in Haskell.
MIT License
408 stars 62 forks source link

Applicative Do for improving efficiency #16

Open adscib opened 8 years ago

adscib commented 8 years ago

GHC 8 introduces the Applicative Do extension, which can be used to desugar do notation to Applicative operations where possible. The paper introducing it actually mentions specifically that monad-bayes could benefit from this extension. We should therefore modify the existing code to use Applicative instead of Monad interface wherever possible.

reubenharry commented 2 years ago

Interesting to think about where this would be useful, practically speaking. https://jtobin.io/encoding-independence-statically seems relevant. We could have a version of Traced with a free applicative, and use the static knowledge of independence to improve inference. Details as yet unclear to me, but for a concrete example:

ex = do
  x <- random
  y <- random
  factor (x+y)
  return (x,y)

In ex, we want a trace that knows that x and y can be changed independently, i.e. that changing the value of x has no effect on other random choices downstream. I assume this is what the Free applicative encodes statically, although I haven't ever used it.

turion commented 2 years ago

It seems to me that using ApplicativeDo is mainly something a library user can do. All the newtypes derive Applicative, so their instances will be used. Other than that it would be interesting to see whether the benchmarks get better if ApplicativeDo is applied globally. I can try that if you want.

reubenharry commented 2 years ago

Did they?

turion commented 2 years ago

Inconclusive. Some benchmarks get faster, some slower. There is no big improvement on either side. So I guess it's a harmless change to enable it by default, but it doesn't magically make everything faster. I wonder what the reasoning was why it should be faster now.

master e491fc49efbbe76e27d971d40907735f966523cb

master.txt

I had to deactivate a module that didn't compile.

Activate ApplicativeDo in library only

diff --git a/monad-bayes.cabal b/monad-bayes.cabal
index 9106bb2..810c3c1 100644
--- a/monad-bayes.cabal
+++ b/monad-bayes.cabal
@@ -94,6 +95,7 @@ library
     LambdaCase
     OverloadedStrings
     TupleSections
+    ApplicativeDo

applicative_do.txt

ApplicativeDo in library only, and replacing all >> by *>

applicative_do*>.txt

ApplicativeDo in library and benchmarks, and replacing all >> by *>

applicativedo*>_bench_apdo.txt

turion commented 2 years ago

The paper introducing it actually mentions specifically that monad-bayes could benefit from this extension.

@adscib I can't find that claim. Can you point me to it?

Maybe we're forcing some functions to use the monad in some places (e.g. when defining <*> = ap).

reubenharry commented 2 years ago

Yeah, I couldn't find the mention either. Not entirely sure which monads would benefit here. (Maybe Traced, or if we had a parallel Population?)

adscib commented 2 years ago

The paper introducing it actually mentions specifically that monad-bayes could benefit from this extension.

@adscib I can't find that claim. Can you point me to it?

Maybe we're forcing some functions to use the monad in some places (e.g. when defining <*> = ap).

I don't see it there either, but based on time stamps it looks like I was referring to a preprint of what is now the published version, so maybe they removed it. In any case, it was an off-hand remark, probably made because a lot of probabilistic programming people at the time were advocating for tracking conditional independencies to improve inference. Since then, I think the community largely moved towards HMC and variational methods that don't really utilize this information. Still, there are some message passing-based algorithms that could benefit from knowing this structure, but it's been many years since I touched this topic so I don't remember too well.

The way I originally thought about this was that with Population you could use N particles independently for each component of the program, obtaining N^2 particles in the Cartesian product space, and immediately subsampling to N particles before performing function application. For Traced it's a little less clear, but potentially you could avoid recomputing some parts of the program if you know they're independent, but I'm not sure how much of a saving that would even be or how to go about achieving it.

Anyway, nice to see this getting picked up. Best of luck!