pymc-devs / pymc

Bayesian Modeling and Probabilistic Programming in Python
https://docs.pymc.io/
Other
8.72k stars 2.02k forks source link

Distance Metrics for Probability Distributions? #2325

Closed bhargavvader closed 4 years ago

bhargavvader commented 7 years ago

Would there be any interest in having a class of distance metrics for Probability Distributions? An example could be like the very-pseduo code in this jupyter notebook.

I had some questions in case we do want this:

  1. What should be the parameters? A pymc3 or numpy/scipy object of a distribution, and in this case, what should be returned?

  2. numpy array of distributions, and we return a numerical value? This notebook does that.

Ping @ColCarroll , @twiecki

junpenglao commented 7 years ago

I think this is a great idea, it would be also very useful for VI @ferrine

We already have KL divergence in https://github.com/pymc-devs/pymc3/blob/master/pymc3/variational/operators.py

ferrine commented 7 years ago

My implementation is mostly for inference purposes. But wrapping it up can be not that difficult. I do not see a possibility to implementat most distances within VI module (thanks to dimensionality curse) but for univariare it makes sense. BTW it might be interesting http://jmlr.csail.mit.edu/papers/v13/gretton12a.html

ferrine commented 7 years ago

Can softAbs be used for kernel Stein discrepancy?

junpenglao commented 7 years ago

I dont see why not, at least under the framework of the VI module it should be quite straight forward.

bhargavvader commented 7 years ago

@ferrine's implementation won't fit into comparing between two distributions, but I'll see what can be done. Any other thoughts on what kind of API would be useful?

junpenglao commented 7 years ago

@bhargavvader are you thinking of an API more similar to the implementation in Tensorflow?

For example, here is how Edward call the kl_divergence from Tensorflow: https://github.com/blei-lab/edward/blob/master/edward/inferences/klqp.py#L451

  kl_penalty = tf.reduce_sum([
      inference.kl_scaling.get(z, 1.0) * tf.reduce_sum(kl_divergence(qz, z))
      for z, qz in six.iteritems(inference.latent_vars)])
twiecki commented 4 years ago

Closing due to inactivity, feel free to reopen.