stripe / rainier

Bayesian inference in Scala.
https://rainier.fit
Apache License 2.0
432 stars 51 forks source link

Push an 0.2.4 release? #430

Closed sritchie closed 4 years ago

sritchie commented 4 years ago

Hey @avibryant,

I'm getting my build tooling set up to start pushing snapshot builds of scala-rl and to do that I'll need an artifact to depend on. Would you mind publishing a new release, or at least a snapshot build?

Thanks!

avibryant commented 4 years ago

@sritchie I didn't realize you depended on anything that landed after the 0.2.3 release - which PRs do you need?

sritchie commented 4 years ago

Whoops, sorry about that - I got my versions scrambled and didn't realize 0.2.3 was what I had been waiting for. Just deleted my libjars and pulled in the dependency and it looks great. Thank you!

I can't remember if I mentioned this, by the way, but I ended up writing a Double-based version of Categorical which is much faster both for calculating expectations and for eventually producing a generator that I call lots of times: https://github.com/sritchie/scala-rl/blob/develop/scala-rl-core/src/main/scala/com/scalarl/rainier/Categorical.scala#L19

it's not so obvious how to integrate this into Rainier. If I get some time over the holidays I'll stare at the code and see if there's some common interface that pops out. It would be great to be able to depend on a Rainier-based class, though, instead of hosting this specialized version.

avibryant commented 4 years ago

@sritchie I don't know if you've done any profiling, but I wonder how much of the performance difference is due to Real using BigDecimal under the hood? I've been wondering if I could replace it with some kind of hybrid Double and Int representation that avoided the most common floating point issues while still preserving most of the performance of Double.

sritchie commented 4 years ago

I have done some profiling and that was definitely the glaring problem, the BigDecimal operations.

Now that I'm beyond basic applications that can be converted into vectorized python operations, my Scala is so much faster than the Python that I'd gladly take the tiny (or zero) performance hit from a hybrid representation.

I haven't thought it through, but I suspect that having full Real available might open up interesting applications in my project that folks haven't explored yet.