ropensci / unconf18

http://unconf18.ropensci.org/
44 stars 4 forks source link

Tensorflow Probability in R #21

Open michaelquinn32 opened 6 years ago

michaelquinn32 commented 6 years ago

Earlier today, Google announced TensorFlow Probability: a probabilistic programming toolbox for machine learning. The full text of the announcement is available on Medium

See the article for full details, but at a high level.

This notebook provides an end-to-end walkthrough on fitting a linear mixed effects model using the InstEval data from lme4.

For this project, unconf participants should come up with a design for how TF Probability will work in R, referring to RStudio's work on keras and tfestimators. Participants will be able to write some of these wrappers, and should hope to complete some example notebooks before the end of the event. It would be great if we could do an R version of the notebook linked above, and maybe others too.

R already has other probabilistic programming languages, in Stan, and there are other R projects that try to build up a probabilistic programming language for TensorFlow (Greta). But this will be the primary Google-supported project in this area, with a lot of new features coming soon.

goldingn commented 6 years ago

Yeah, it's awesome that the folks at Google are pulling all of these bits together!

By the way, greta already uses parts of probability (the distributions), and will be providing greta-like interfaces to the inference methods going forward (and we'll be trying to keep up with the new developments). These and some other variational inference and stochastic gradient MCMC methods should start appearing in the development version of greta over the next couple of months.

greta provides a higher-level interface than the modules in probability though; so it's necessarily more limited and I think making it easy for people to use probability directly in R is a great idea! We could even make the resulting package a dependency of greta's.

Though I can't quite envisage how a more R-like probability interface would fit in between greta and the interface to probability provided by the R tensorflow API, since the interface to probability is lower-level than keras or estimators. What sort of functionality do you have in mind?

michaelquinn32 commented 6 years ago

First of all, hi Nick!

Sorry, I didn't realize you were attending the unconf, but I'm really excited to meet and work with you. TBH, I got permission to come to the unconf because I was planning to attend the event to do something "Tensorflow related," but working on Greta definitely falls under that umbrella! If that's the big TFP project, I'd be really happy.

Full disclosure, I'm not a TFP developer, my work at Google is on something else entirely, but I do contribute to the R infrastructure used by everyone. With that caveat out of the way, I see greta fitting into the TFP stack the same way that edward2 does, as a language for building models. I had assumed that we'd be wrapping edward2 when working in TFP during the unconf, but there is no reason to stick to that plan.

Anyway, it's really great to meet you! And I'm really looking forward to collaborating.

goldingn commented 6 years ago

Hi Michael :)

Oh great. Well I'd be very happy to collaborate on anything tensorflowy!

There are tons of things to be done on greta, so happy to go down that route. But also very happy to work on another TFP project.

A couple of things that spring to mind: easily dispatching parallel MCMC chains to separate Cloud ML jobs (Id love this for greta, hence my interest in parallel progress bars); using greta's R -> TF syntax mapping to enable speedups/parallelisation/GPU support when evaluating arbitrary R functions; a sort of TF estimators-like containerisation of greta/TFP models, so that they can be easily wrapped up to have train() and predict() methods, and possibly support online learning with an update_training() method or something.

mmulvahill commented 6 years ago

Hi @michaelquinn32 & @goldingn!

I'm new to Greta & TensorFlow, but they're both on my list of tools to learn. I'd be interested in working on a TFP/greta project if I could be helpful.

On a related note, where does TFP/greta stand on varying dimension methods like reversible jump and birth-death MCMC? My impression is most probabilistic languages/tools tend not to feature varying dimension methods like reversible jump and birth-death since they add significant complexity to MCMC. I've been working on a collection of birth-death MCMC models for reproductive endocrinology research for a while -- recently have been refactoring them from C to C++/R -- so I have a keen interest in whether TFP supports birth-death ;)

goldingn commented 6 years ago

@mmulvahill that's great! Sounds like a team is coming together :)

We should try to distill this down into a tangible project (or a few, in separate issues) so folks can decide what to do next week.

I suspect the short answer to your Q is that they can't do that because the tensorflow graph is static, not dynamic. That said there's almost always a way of hacking in that behaviour, and there are extensions like fold, which would work. greta's current API is even more static than Tensorflow (since all array dimensions are fixed), but that could potentially be generalised in the future. This would be a great thing to discuss at the unconf!

goldingn commented 6 years ago

Any particular angles on tfp/greta that look like they would be interesting and feasible to make progress on in a couple of days?

goldingn commented 6 years ago

P.S. the greta dev branch (which will become greta 0.3) now depends on TFP, and directly uses TFP's samplers

michaelquinn32 commented 6 years ago

I like the progress on the dev branch Nick!

Thus far, my work with TFP has been trying to implement (derivatives of) provided materials. I really enjoyed playing with this: https://github.com/tensorflow/probability/blob/master/tensorflow_probability/examples/jupyter_notebooks/Linear_Mixed_Effects_Models.ipynb

It's a relatively simple model, but has a really interesting approach to inference: using an EM algorithm that both applies a sample and an optimizer.

Implementing that would be a big job for those couple of days and a lot of fun to work on. Hacking on the sampler would (hopefully) add some nice abstractions for customizing inference. I saw hmc was in the dev branch already (AWESOME!!!), but there are like a million samplers popping up in TFP these days. https://github.com/tensorflow/probability/tree/master/tensorflow_probability/python/mcmc

Stepping further back, having some sort of abstraction for how inference happens could eventually lead greta to supporting VI too. I don't think we should implement it yet (considering this is way out of my depth and VI looks a bit like black magic to me*), but it would be nice to be able to make that step once there are a few more implementations out there in the wild.

goldingn commented 6 years ago

Yeah, that's a great demo of what TFP can do!

We actually have some prototype VI code (for Stein VGD) that'll be added to greta soon; though probably after next month's release of 0.3. We'll also be adding minibatching and other nice things that @martiningram has been working on.

The dev branch changes have rejigged greta to have an (internal) inference class from which samplers and optimisers inherit, so the various VI approaches will be added as a third inference option. That generalisation of the inference structure also means its easy to add the other TFP samplers (and TF optimisers). I'm planning to hook up the MH and MALA samplers in the next release. I might even do that on the plane over to Seattle this weekend :)

So developing and adding new inference methods would definitely be feasible.

michaelquinn32 commented 6 years ago

Separately, I noticed that Greta doesn't yet support tfp transformed distributions and bijectors:

Bijectors are an abstraction for transforming distributions. For example, in tfp you create the log normal distribution by transforming the normal distribution.

import tensorflow_probability as tfp
tfd = tfp.distributions
tfb = tfp.distributions.bijectors

log_normal = tfd.TransformedDistribution(
  distribution=tfd.Normal(loc=0., scale=1.),

  # The bijector encaspulates several different transformations together
  bijector=tfb.Exp(),
  name="LogNormalTransformedDistribution")

You can do some crazy stuff with bijectors. See this notebook, for example: https://github.com/tensorflow/probability/blob/master/tensorflow_probability/examples/jupyter_notebooks/Gaussian_Copula.ipynb

A full implementation might take time, but it wouldn't hurt to get started on this during the unconf.

goldingn commented 6 years ago

I think bijectors is at a lower level than the greta user API. greta deliberately doesn't let users mess around with distributions and densities directly, since (even if bijection was handled) that relies on users understanding things like log jacobian transformations, (which most users won't) and can easily lead to incorrectly specified models.

Of course having lower-level APIs is useful (and you can always do the low-level TFP stuff in R by importing TFP with reticulate), it's just out of scope for greta.

michaelquinn32 commented 6 years ago

Sounds good. There will be plenty to do at the unconf anyway.

Separately, though, it would be nice to think about how you want to handle transformed distributions in edward. The current scope of distributions supported is quite nice (great job on that!), but what about:

There's probably more there than that, but it's a good place to start thinking about how you'll want to solve these problems. FWIW, these are still open issues with TFP, but IIRC, Stan currently handles some issues like this.

goldingn commented 6 years ago

Yeah, we should have a chat about this tomorrow! I'd be really interested to hear your take on what functionality would be useful and what would be a nice syntax to use.

Actually both of those things are implemented in greta already. Truncated distributions have been in there for a while though mixtures are new so only on dev at the moment.