stripe / rainier

Bayesian inference in Scala.
https://rainier.fit
Apache License 2.0
432 stars 51 forks source link

Model.sample #409

Closed avibryant closed 4 years ago

avibryant commented 4 years ago

This closes the loop by giving a toSample to Prediction which does the actual sampling (but still allows modification to the generating function, after sampling, for quick exploration).

This is the minimal change to be able to actually run new-style models. Once this works and is landed I'll go back and revisit the APIs around Sampler, TargetGroup, etc.

codecov-io commented 4 years ago

Codecov Report

Merging #409 into nuke-rv will decrease coverage by 0.16%. The diff coverage is 16.66%.

Impacted file tree graph

@@             Coverage Diff             @@
##           nuke-rv     #409      +/-   ##
===========================================
- Coverage    33.05%   32.88%   -0.17%     
===========================================
  Files           66       66              
  Lines         2538     2554      +16     
  Branches       144      133      -11     
===========================================
+ Hits           839      840       +1     
- Misses        1699     1714      +15
Impacted Files Coverage Δ
...src/main/scala/com/stripe/rainier/core/Model.scala 0% <0%> (ø) :arrow_up:
...main/scala/com/stripe/rainier/compute/Target.scala 0% <0%> (ø) :arrow_up:
...rc/main/scala/com/stripe/rainier/core/Sample.scala 0% <0%> (ø) :arrow_up:
...ain/scala/com/stripe/rainier/compute/RealOps.scala 77.5% <66.66%> (-0.47%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 99cb765...04fdf0c. Read the comment docs.

avibryant commented 4 years ago

Yes, you're asking the right question, and I've gone down the same train of thought myself. I go back and forth on this. My argument in favor of the current approach is:

Another point in the design space would be to get rid of Prediction and just have sample(g: Generator[T]): Sample[T] on Model; and if you want to pass around a not-yet-sampled Model + Generator you just do that as a tuple of those two things.

Thoughts?

andrew-stripe commented 4 years ago

I think I like the idea of starting with:

get rid of Prediction and just have sample(g: Generator[T]): Sample[T] on Model

At worse, Prediction comes back later and this ends up as sugar for the very common case of model.predict(g).toSample

avibryant commented 4 years ago

@andrew-stripe so this does something even more minimal, which is that Model has sample() which only takes sampler parameters, no generator, and then returns a Sample; and then you can run predict(g) on Sample to get a list of values.