Closed avibryant closed 4 years ago
https://github.com/stripe/rainier/pull/427/commits/67b4eeab515e8404fcbfb04ee7893cff4d16b5ed shows how dramatically slower this can be, having removed the optimizations. Future PRs should get things back to the way it was before.
Finished those Fn implementations as well as adding tests which makes me much more confident in them.
The tests are timing out, presumably because of the lack of optimizations (it only takes 30s to run on my MBP but the container travis is using must have a lot less CPU). I'll try to bring some of them back to get things green again.
The motivating change here was to attach the
Array[Double]
of values directly toPlaceholder
, so that it could compute itsbounds
correctly.One consequence of this is that now, in the same way that we discover a
Parameter
's prior through reachability (ie by traversing the graph from the likelihood), we can discover which data we need to include in the model the same way; this is nice when you might be, eg, encoding all of a large case class, but only actually referencing one or two fields in your model. Currently, to minimize the diff, this happens during the creation of aTarget
(which used to get passed the data explicitly).There are two other changes rolled into this PR because the Placeholder change required major surgery in their areas and so it seemed more efficient to move things forward while I was at it.
The first and less significant is a couple of optimizations that were dropped from Target and DataFunction. In the short term this makes things significantly slower, but my intent is to bring them back, implemented at a higher layer of the system, which will allow them to be more flexible and have less complexity cost. The two optimizations are the dynamic inlining of likelihood computation, and the
batchBits
partial loop unrolling of the same.The second is to merge
Encoder
intoFn
. Using aFn
is now the only way to encode data for batch computation, and the mechanics of doing so can be wholly encapsulated insideFn
instead of being fairly openly accessible inEncoder
.This means that
Distribution[T]
now just needs alikelihoodFn: Fn[T,Real]
instead of a lot of encoder-related stuff. (This then can generally delegate to alogDensity
).For now I've chosen to go with an explicit API for building
Fn
s for complex data types, rather than the implicit API thatEncoder
had. So for example instead ofYou need to do
We can certainly bring back more magical implicits later if we want; however, since they can't do everything we'd want (eg, for a
List[Int]
we need to specify the size of the list, which you can't do implicitly), I thought we should try going all the way explicit.