hakaru-dev / hakaru

A probabilistic programming language
BSD 3-Clause "New" or "Revised" License
309 stars 30 forks source link

Implement splitting of special cases from linear operators #104

Open JacquesCarette opened 7 years ago

JacquesCarette commented 7 years ago

Given a summate, such as

summate kh from 0 to size(t):
    if _a == (if kh == docUpdate: zNewb else: z[kh]): 1 else: 0

we have two cases: 'kh' is generic, and 'kh' is special. We recognize that, for our computation, kh = docUpdate is special. So we split off the computation into 2 pieces, the generic piece, and the special piece.

Let's generalize (and this is also the generalization which is what I was trying to convey yesterday): Suppose we have a bunch of expressions that depend on kh (our summation variable). Let's denote these by X (i.e. X is a 'vector' of expressions).

Then what we have is

summate kh from 0 to size(t):
   Phi(X)

where Phi is a Hakaru program-schema with |X| holes, which we fill with the expressions from X. Let's assume that one of these expressions in X is kh == docUpdate. Let cons(kh == docUpdate, Y) == X. So we have

summate kh from 0 to size(t):
   Phi(kh == docUpdate, Y)

Now we can do the transformation we've been talking about:

summate kh from 0 to size(t):
   Phi(false, Y)
+ Phi(true, Y[kh <- docUpdate]) - Phi(false, Y[kh <- docUpdate])

followed by aggressive constant folding. This is a win as soon as size(t) is 4 or so. The loss is tiny otherwise, and if size(t) had been statically known to be small, unrolling would have taken care of that.

The additional advantages are that:

  1. Phi(false, Y) may now be free of docUpdate, and so histogram may well be able to lift the whole summate higher.
  2. Phi(false, Y) is "more regular" - especially if Phi(X) had only one 'if' which is now gone. So lots of simplifications are now enabled which were not before.

This transformation generalizes quite a bit more:

  1. from 'summate' to any linear operator
  2. from (index == point) to any predicate P(index) such that P^{-1}(true) is a finite set [of known size], adjusted for inclusion/exclusion. [Ken's code in NewSLO/Piecewise.mpl does this when size=1, as it does not try to find "index == point" but rather a solveable predicate with solution size = 1].
  3. from 1 dimensions to many

Note that this transformation, when generalized properly to apply to continuous domains (like R, the reals), gives us that we can ignore the value of integrands at single points (since the correction terms are both 0!). BUT it also allows us to do the same transformation when the 'background measure' is not Lebesgue but can have Dirac -- the correction terms are then crucial.

A different generalization goes from 'linear operator' to 'bigop' (in the sense of Gonthier). But that case has a caveat: one must be able to prove, statically, that the exceptional cases are non-singular (else the correction term will be undefined). Which, in our case, may actually occur: think of conditionally expressed density (say Gaussian(0,1) in 25 dimensions and Gaussian(0,2) in 1 dimension, in a 26 dimensional situation). As the density is purely positive, it is safe to perform the above transformation.

Note that the generalization to "predicate P(index) such that P^{-1}(true) is a finite set [of known size]" is actually at the root of many of the simplifications found in the literature (as far as I can tell).

This does generalize further (to various kinds of notions of partitions) -- see my 'Symbolic Domain Decompositions' paper for that. But I don't think those generalizations are (yet) useful to Hakaru. Just fun ;).

PS: one last note. 'bool2int', 'int' casts of booleans, counting on (0*x = 0) do not occur anywhere in the above. They are not needed for this simplification, and act as a strong block to generalization.

JacquesCarette commented 7 years ago

NewSLO/Piecewise.mpl line 67 would be one place to start.

ccshan commented 7 years ago

Let's simplify the description above by letting X contain just one hole (filled initially with kh == docUpdate) and getting rid of Y by merging it into Phi itself. So we're rewriting

(summate kh from 0 to size(t): Phi(kh == docUpdate))

to

(summate kh from 0 to size(t): Phi(false))
+ if 0 <= docUpdate && docUpdate < size(t):
    (Phi(true) - Phi(false))[kh <- docUpdate]
  else: 0

This change also avoids the problem that, if Phi itself uses kh, then Phi(true, Y[kh <- docUpdate]) and Phi(false, Y[kh <- docUpdate]) would still use kh -- unbound.

What if Phi(false)[kh <- docUpdate] is undefined (for example, if Phi contains 1/(kh-docUpdate))?

JacquesCarette commented 7 years ago

Very good improvement, thanks.

You are correct that if Phi(false) is undefined at kh = docUpdate, there is a problem with this transformation. I guess a try ... catch would be in order.

JacquesCarette commented 7 years ago

@yuriy0 the change you just reverted was in support of this issue, where Ken implemented the Kronecker expansion in a branch. Can you run the tests on that branch (with your changes merged in), to see the status?

I think Ken ran into issues where things were taking too long (some products?). We need to understand the effect of Kronecker expansion on our test suite, to see if it could indeed be made into a 'by default' simplification.

yuriy0 commented 7 years ago

We need to understand the effect of Kronecker expansion on our test suite, to see if it could indeed be made into a 'by default' simplification.

It seems the next step (towards merging kronecker_in_simplify into master) is to profile e.g. gmm_gibbs with and without the changes on this branch.