JointDistributionCoroutine vs Pymc4 style models

ksachdeva commented 4 years ago

Hi,

First of all thank you for this project. I am invested in tensorflow stack however it has been extremely difficult to use the tensorflow_probability apis. I am new to whole probability and bayesian inference aspect so may be my struggle with tfp can be attributed to that. Pymc4 is a ray of hope for me to still stick with tensorflow stack while using easier to approach library and apis.

I have read the design guide and understand how you are leveraging python generators to cleanly specify the model function. I have debugged it as well. I quite like it.

That said, when I look at JointDistributionCoroutine in tfp, the approach also looks similar. At the very least, similar to pymc4, they make use of yield/generators to define the model function.

# tfp model definition
def dist():
  a = yield Root(tfd.Bernoulli(probs=0.5, dtype=tf.float32))
  b = yield tfd.Bernoulli(probs=0.25 + 0.5*a, dtype=tf.float32)
  yield tfd.Normal(loc=a, scale=1. + b)

joint = tfd.JointDistributionCoroutine(dist, validate_args=True)
z = joint.sample(4)
prob = joint.prob(z)

I have no doubt that pymc4 apis are going to be much easier to use (especially for beginners) and better documented. However, I started to wonder if there is more here in terms of difference in technical approach of using generators between the two libraries (tfp vs pymc4). I ask this because I can easily understand the code of pymc4 regarding the generator function execution where as the implementation of JointDistributionCoroutine here - https://github.com/tensorflow/probability/blob/r0.8/tensorflow_probability/python/distributions/joint_distribution_coroutine.py left me puzzled. It is quite terse and small but I do not understand how it manages to do what it does.

Any guidance and insight would be very helpful.

Regards & thanks Kapil

ericmjl commented 4 years ago

Hi @ksachdeva, thanks for pinging in!

I was present when the design decisions were being made, and if you were there, you would have been 100% entertained by the flash of genius that @ferrine had when he came up with this idea. I witnessed genius in action :smile:. If my recollection is correct, it was 100% inspired by the JointDistributionCoroutine by the TFP folks, but I think our current design is more user-friendly. That’s about as much as I can recall, but maybe @twiecki and @ferrine can chip in more.

I’m going to stop here, as most of my contributions have been docs and infrastructure-ish things to the project.

Cheers, Eric

twiecki commented 4 years ago

Everything said here so far is accurate. @ksachdeva Let me know if you have more questions.

ksachdeva commented 4 years ago

Thanks @ericmjl @twiecki for the answers and background.

Let me try to ask specific questions as the previous one could be considered bit open ended -

Question 1:

@ericmjl said that "... I think our current design is more user friendly". Would it be possible to express the same with some code snippets as this is what I am seeking the answer to ?.

a) Is it that pymc4 will not require the user to worry about the "shape" of distributions etc which seem to be a nuisance (even though necessary) !

b) I have yet to fully grasp the role of "Independent" and "Sample" that appear (interleave) in some of their model specification; would using pymc4 help me not worry about these and possible have better api if required (like deterministic ?)

Question 2:

Which implementation of coroutine style model function is (would be) faster - Pymc4 or JointDistributionCoroutine ? - if one ignores the user friendly aspect.

Regards Kapil

ericmjl commented 4 years ago

@ksachdeva I think I can answer question 2: There should be no difference in speed, as speed is primarily determined by the samplers provided in TFP.

w.r.t. question 1 (a): Shape is in development: there’s the “plate” argument (which I haven’t used yet), which matches statistical modelling convention (plate notation).

as for question 1 (b): I will have to defer to other developers on this.

lucianopaz commented 4 years ago

@ksachdeva, I just wanted to add some things to what's been said already regarding pymc4's features

Autobatching

This applies to your question 1.a and b. By default, pymc4 uses tf.vectorized_map to automatically call the model's log_prob or the analogue of JointDistributionCoroutine's sample in parallel across independent draws or chains. This means that the users will only have to worry about making their model work across the distribution's core dimensions (in lax terms everything gets treated as part of an event dimension), without thinking of how Root will affect the batch_shape down the line, or if the model returns a log_prob for each batch axis, or if an Independent or Sample distribution should be used in the middle of the flow. However, this ease of use will likely come at the expense of sampling speed. #193 will allow users to opt out of auto batching and give them control to deal with all the shape subtleties. Once again, we don't know how much overhead the auto batching adds, but once #193 is finished we'll be able to measure it. I imagine it should be negligible for small models.

Auto transformed distributions

Just like in pymc3, pymc4 will automatically transform the bounded distributions (like Beta or Gamma) to be unbounded in the entire real line (using a logit or log transform respectively). This is important for many variational inference algorithms, like ADVI, that need all variables to be unbounded.

This feature is currently working, and thanks to the different executors implemented in the flow module, we only transform the distributions when the users want to perform inference on the enclosing model (pm.sample).

Auto tuning of hmc

Just like in pymc3, pymc4 will be very opinionated on how to initialize the MCMC metaparameters, like the step size and mass matrix, and also the initial sampling state. At the moment, these still haven't been written or ported from pymc3.

Automatically setting an appropriate sampling transition kernel

Again, like in pymc3, we will automatically set the sampler's transition kernel according to the distributions defined in the model (nuts for all continuous and some metropolis variant for discrete distributions). This still hasn't been implemented though.

Others

There are many very nice ideas for features to add. For example, all the symbolic-pymc work that @brandonwillard is commandeering would be really nice to include somehow into pymc4. But I don't know enough of them to be able to comment on them here in detail.

Comment on speed

Once #193 is finished, we'll be able to measure speed differences between pymc4 and JointDistributionCoroutine, with and without auto batching. The speed difference at this level will mostly only affect forward sampling (prior and posterior predictive sampling). Sampling speed and efficiency at inference time is mostly determined by the model's characteristics (if the log_prob function is multimodal, rugged or ill conditioned) and also by the samples metaparameters and tuning. The latter aspects should be handled reasonably well by default by pymc4, just like what pymc3 does.

ksachdeva commented 4 years ago

@ericmjl @lucianopaz Many many thanks for these insights. These do answer my questions and are helpful in choosing the direction viz-a-viz framework selection for bayesian statistics/learning/inference.

Regards Kapil

pymc-devs / pymc4