Closed ksachdeva closed 4 years ago
Hi @ksachdeva, thanks for pinging in!
I was present when the design decisions were being made, and if you were there, you would have been 100% entertained by the flash of genius that @ferrine had when he came up with this idea. I witnessed genius in action :smile:. If my recollection is correct, it was 100% inspired by the JointDistributionCoroutine by the TFP folks, but I think our current design is more user-friendly. That’s about as much as I can recall, but maybe @twiecki and @ferrine can chip in more.
I’m going to stop here, as most of my contributions have been docs and infrastructure-ish things to the project.
Cheers, Eric
Everything said here so far is accurate. @ksachdeva Let me know if you have more questions.
Thanks @ericmjl @twiecki for the answers and background.
Let me try to ask specific questions as the previous one could be considered bit open ended -
@ericmjl said that "... I think our current design is more user friendly". Would it be possible to express the same with some code snippets as this is what I am seeking the answer to ?.
a) Is it that pymc4 will not require the user to worry about the "shape" of distributions etc which seem to be a nuisance (even though necessary) !
b) I have yet to fully grasp the role of "Independent" and "Sample" that appear (interleave) in some of their model specification; would using pymc4 help me not worry about these and possible have better api if required (like deterministic ?)
Which implementation of coroutine style model function is (would be) faster - Pymc4 or JointDistributionCoroutine ? - if one ignores the user friendly aspect.
Regards Kapil
@ksachdeva I think I can answer question 2: There should be no difference in speed, as speed is primarily determined by the samplers provided in TFP.
w.r.t. question 1 (a): Shape is in development: there’s the “plate” argument (which I haven’t used yet), which matches statistical modelling convention (plate notation).
as for question 1 (b): I will have to defer to other developers on this.
@ksachdeva, I just wanted to add some things to what's been said already regarding pymc4's features
This applies to your question 1.a and b. By default, pymc4 uses tf.vectorized_map
to automatically call the model's log_prob
or the analogue of JointDistributionCoroutine
's sample
in parallel across independent draws or chains. This means that the users will only have to worry about making their model work across the distribution's core dimensions (in lax terms everything gets treated as part of an event dimension), without thinking of how Root
will affect the batch_shape
down the line, or if the model returns a log_prob
for each batch axis, or if an Independent
or Sample
distribution should be used in the middle of the flow.
However, this ease of use will likely come at the expense of sampling speed. #193 will allow users to opt out of auto batching and give them control to deal with all the shape subtleties. Once again, we don't know how much overhead the auto batching adds, but once #193 is finished we'll be able to measure it. I imagine it should be negligible for small models.
Just like in pymc3, pymc4 will automatically transform the bounded distributions (like Beta
or Gamma
) to be unbounded in the entire real line (using a logit
or log
transform respectively). This is important for many variational inference algorithms, like ADVI, that need all variables to be unbounded.
This feature is currently working, and thanks to the different executors implemented in the flow module, we only transform the distributions when the users want to perform inference on the enclosing model (pm.sample
).
Just like in pymc3, pymc4 will be very opinionated on how to initialize the MCMC metaparameters, like the step size and mass matrix, and also the initial sampling state. At the moment, these still haven't been written or ported from pymc3.
Again, like in pymc3, we will automatically set the sampler's transition kernel according to the distributions defined in the model (nuts for all continuous and some metropolis variant for discrete distributions). This still hasn't been implemented though.
There are many very nice ideas for features to add. For example, all the symbolic-pymc work that @brandonwillard is commandeering would be really nice to include somehow into pymc4. But I don't know enough of them to be able to comment on them here in detail.
Once #193 is finished, we'll be able to measure speed differences between pymc4 and JointDistributionCoroutine
, with and without auto batching. The speed difference at this level will mostly only affect forward sampling (prior and posterior predictive sampling). Sampling speed and efficiency at inference time is mostly determined by the model's characteristics (if the log_prob
function is multimodal, rugged or ill conditioned) and also by the samples metaparameters and tuning. The latter aspects should be handled reasonably well by default by pymc4, just like what pymc3 does.
@ericmjl @lucianopaz Many many thanks for these insights. These do answer my questions and are helpful in choosing the direction viz-a-viz framework selection for bayesian statistics/learning/inference.
Regards Kapil
Hi,
First of all thank you for this project. I am invested in tensorflow stack however it has been extremely difficult to use the tensorflow_probability apis. I am new to whole probability and bayesian inference aspect so may be my struggle with
tfp
can be attributed to that. Pymc4 is a ray of hope for me to still stick with tensorflow stack while using easier to approach library and apis.I have read the design guide and understand how you are leveraging python generators to cleanly specify the model function. I have debugged it as well. I quite like it.
That said, when I look at
JointDistributionCoroutine
intfp
, the approach also looks similar. At the very least, similar to pymc4, they make use of yield/generators to define the model function.I have no doubt that pymc4 apis are going to be much easier to use (especially for beginners) and better documented. However, I started to wonder if there is more here in terms of difference in technical approach of using generators between the two libraries (tfp vs pymc4). I ask this because I can easily understand the code of pymc4 regarding the generator function execution where as the implementation of
JointDistributionCoroutine
here - https://github.com/tensorflow/probability/blob/r0.8/tensorflow_probability/python/distributions/joint_distribution_coroutine.py left me puzzled. It is quite terse and small but I do not understand how it manages to do what it does.Any guidance and insight would be very helpful.
Regards & thanks Kapil