Open alembcke opened 3 years ago
Computing exact mvn cdf in general is a surprisingly hard problem! I think there’s been discussion previously (may have been on Google internal boards I can’t recall). I can’t quite tell from reading the source but I think scipy is using a Monte Carlo estimate. In low dimensions this can work. In TFP we have a contract that precludes using MC approximations for core api functions. If we can’t compute analytically we raise not implemented. We could consider adding a separate monte_carlo_cdf or something. It should be pretty easy to hand write as well (sample a bunch of points, count freq of points outside the integration range). Hope this provides some clarity at least!
On Sun, Jun 20, 2021 at 11:39 Alex Lembcke @.***> wrote:
System information
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 21.04 TensorFlow installed from (source or binary): binary TensorFlow versions:
$ python -c "import tensorflow as tf; import tensorflow_probability as tfp; print(tf.version.GIT_VERSION, tf.version.VERSION, tfp.version)" v2.5.0-rc3-213-ga4dfb8d1a71 2.5.0 0.13.0
Issue
I am trying to get the cumulative distribution function for a bivariate normal distribution, here is the code:
import tensorflow as tfimport tensorflow_probability as tfptfd = tfp.distributionstf.compat.v1.enable_eager_execution() mvn = tfd.MultivariateNormalTriL(loc=[0,0], scale_tril=tf.linalg.cholesky([[1,0.5],[0.5,1]]))mvn.cdf([0,0])
And here is the error that I get:
Traceback (most recent call last): File "
", line 1, in File "/home/alexlembcke/.local/lib/python3.9/site-packages/tensorflow_probability/python/distributions/distribution.py", line 1438, in cdf return self._call_cdf(value, name, kwargs) File "/home/alexlembcke/.local/lib/python3.9/site-packages/tensorflow_probability/python/distributions/distribution.py", line 1414, in _call_cdf return self._cdf(value, kwargs) File "/home/alexlembcke/.local/lib/python3.9/site-packages/tensorflow_probability/python/distributions/transformed_distribution.py", line 407, in _cdf self.bijector._internal_is_increasing(bijector_kwargs), # pylint: disable=protected-access File "/home/alexlembcke/.local/lib/python3.9/site-packages/tensorflow_probability/python/bijectors/bijector.py", line 1171, in _internal_is_increasing return self._call_is_increasing(name, kwargs) File "/home/alexlembcke/.local/lib/python3.9/site-packages/tensorflow_probability/python/bijectors/bijector.py", line 1155, in _call_is_increasing return tf.identity(self._is_increasing(kwargs)) File "/home/alexlembcke/.local/lib/python3.9/site-packages/tensorflow_probability/python/bijectors/chain.py", line 147, in _is_increasing is_increasing, b._internal_is_increasing(kwargs.get(b.name, {}))) # pylint: disable=protected-access File "/home/alexlembcke/.local/lib/python3.9/site-packages/tensorflow_probability/python/bijectors/bijector.py", line 1171, in _internal_is_increasing return self._call_is_increasing(name, kwargs) File "/home/alexlembcke/.local/lib/python3.9/site-packages/tensorflow_probability/python/bijectors/bijector.py", line 1155, in _call_is_increasing return tf.identity(self._is_increasing(kwargs)) File "/home/alexlembcke/.local/lib/python3.9/site-packages/tensorflow_probability/python/bijectors/bijector.py", line 1150, in _is_increasing raise NotImplementedError(' _is_increasing
not implemented.') NotImplementedError:_is_increasing
not implemented.If I adapt the code to use scipy, like so:
from scipy.stats import multivariate_normal mvn = multivariate_normal(mean=[0,0], cov=[[1,0.5],[0.5,1]])mvn.cdf([0, 0])
Then everything works fine. Is this a bug or am I doing something wrong in my tensorflow code?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/tensorflow/probability/issues/1361, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABG2GJJGJ6GTE5ZLQ7RIKDTTYDTNANCNFSM47AFZM2A .
Thank you, it does provide the clarity that I wasn't making a mistake in my code and that this isn't a simple bug that will be fixed tomorrow.
Should I keep this issue open to track when the mvn cfd is implemented, whether via Monte Carlo or some other method?
Yes, I think it's fine to keep this open as a feature request (maybe update
the title?). I don't think anyone will have a chance to work on it for a
while, though. IIRC there's a pretty clean one-to-two-liner to get the CDF
via MC. Something like tfp.monte_carlo.expectation(lambda x: x < lim, mvn.sample(num_mc_samples))
, where lim is the limit point of the
integration region and num_mc_samples is the number of samples you want.
Note that scipy uses 1e6 times the dimensionality, but in reality you
probably want 1e6 ** dimensionality (which is why MC isn't a great solution
for dim > ~2).
On Sun, Jun 20, 2021 at 1:52 PM Alex Lembcke @.***> wrote:
Thank you, it does provide the clarity that I wasn't making a mistake in my code and that this isn't a simple bug that will be fixed tomorrow.
Should I keep this issue open to track when the mvn cfd is implemented, whether via Monte Carlo or some other method?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tensorflow/probability/issues/1361#issuecomment-864589456, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABG2GKBRNHRZVKGH3PS4GDTTYTEBANCNFSM47AFZM2A .
There's a lot of work on computing MVN integrals more-or-less exactly. This looks like a good reference: https://www.springer.com/gp/book/9783642016882
My very-uninformed impression is that in low dimensions (d <= N
where N
is somewhere between 2 and 8) there's probably some method or set of methods that would meet the TFP bar of 'runs in polynomial time and returns an ~exact result'. As Chris said, I don't know of anyone planning to work on this, but it could be a fun contribution.
Hi Christopher, I updated the title to make clear that it is a feature request.
Hi Dave, it seems like Genz, one of the authors on the book you link to, is the primary source when it comes to MVN integrals. Until this feature is implemented I will be using his 2004 approximation algorithm (accurate to 16 decimal places) for the standard bivariate MVN.
System information
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 21.04 TensorFlow installed from (source or binary): binary TensorFlow versions:
Issue
I am trying to get the cumulative distribution function for a bivariate normal distribution, here is the code:
And here is the error that I get:
If I adapt the code to use scipy, like so:
Then everything works fine. Is this a bug or am I doing something wrong in my tensorflow code?