stripe / rainier

Bayesian inference in Scala.
https://rainier.fit
Apache License 2.0
432 stars 51 forks source link

Placeholder(values), and fold Encoder into Fn #427

Closed avibryant closed 4 years ago

avibryant commented 4 years ago

The motivating change here was to attach the Array[Double] of values directly to Placeholder, so that it could compute its bounds correctly.

One consequence of this is that now, in the same way that we discover a Parameter's prior through reachability (ie by traversing the graph from the likelihood), we can discover which data we need to include in the model the same way; this is nice when you might be, eg, encoding all of a large case class, but only actually referencing one or two fields in your model. Currently, to minimize the diff, this happens during the creation of a Target (which used to get passed the data explicitly).

There are two other changes rolled into this PR because the Placeholder change required major surgery in their areas and so it seemed more efficient to move things forward while I was at it.

The first and less significant is a couple of optimizations that were dropped from Target and DataFunction. In the short term this makes things significantly slower, but my intent is to bring them back, implemented at a higher layer of the system, which will allow them to be more flexible and have less complexity cost. The two optimizations are the dynamic inlining of likelihood computation, and the batchBits partial loop unrolling of the same.

The second is to merge Encoder into Fn. Using a Fn is now the only way to encode data for batch computation, and the mechanics of doing so can be wholly encapsulated inside Fn instead of being fairly openly accessible in Encoder.

This means that Distribution[T] now just needs a likelihoodFn: Fn[T,Real] instead of a lot of encoder-related stuff. (This then can generally delegate to a logDensity).

For now I've chosen to go with an explicit API for building Fns for complex data types, rather than the implicit API that Encoder had. So for example instead of

Fn.encode[(Int,Int)]

You need to do

Fn.numeric[Int].zip(Fn.numeric[Int])

We can certainly bring back more magical implicits later if we want; however, since they can't do everything we'd want (eg, for a List[Int] we need to specify the size of the list, which you can't do implicitly), I thought we should try going all the way explicit.

avibryant commented 4 years ago

https://github.com/stripe/rainier/pull/427/commits/67b4eeab515e8404fcbfb04ee7893cff4d16b5ed shows how dramatically slower this can be, having removed the optimizations. Future PRs should get things back to the way it was before.

avibryant commented 4 years ago

Finished those Fn implementations as well as adding tests which makes me much more confident in them.

avibryant commented 4 years ago

The tests are timing out, presumably because of the lack of optimizations (it only takes 30s to run on my MBP but the container travis is using must have a lot less CPU). I'll try to bring some of them back to get things green again.