tensorflow / probability

Probabilistic reasoning and statistical analysis in TensorFlow
https://www.tensorflow.org/probability/
Apache License 2.0
4.26k stars 1.1k forks source link

TensorFlow v2.14.0 breaks TensorFlow Probability at import #1752

Open matthewfeickert opened 1 year ago

matthewfeickert commented 1 year ago

Summary of problem

Today's release of TensorFlow v2.14.0 breaks TensorFlow Probability at import.

In a fresh Python 3.11 virtual environment, installation of tensorflow v2.14.0 and tensorflow-probability v0.21.0 causes a

ValueError: Arg specs do not match: original=FullArgSpec(args=['input', 'dtype', 'name', 'layout'], varargs=None, varkw=None, defaults=(None, None, None), kwonlyargs=[], kwonlydefaults=None, annotations={}), new=FullArgSpec(args=['input', 'dtype', 'name'], varargs=None, varkw=None, defaults=(None, None), kwonlyargs=[], kwonlydefaults=None, annotations={}), fn=<function ones_like_v2 at 0x7fce4da0e480>

at import of both.

Reproducible example

$ docker run --rm -ti python:3.11 /bin/bash 
root@c4673c623033:/# python -m venv venv && . venv/bin/activate
(venv) root@c4673c623033:/# python -m pip --quiet install --upgrade pip setuptools wheel
(venv) root@c4673c623033:/# python -m pip --quiet install --upgrade 'tensorflow==2.14.0' 'tensorflow-probability==0.21.0'
(venv) root@c4673c623033:/# python -m pip list | grep tensorflow
tensorflow                   2.14.0
tensorflow-estimator         2.14.0
tensorflow-io-gcs-filesystem 0.34.0
tensorflow-probability       0.21.0
(venv) root@c4673c623033:/# python -c 'import tensorflow; import tensorflow_probability'
2023-09-26 23:27:07.453221: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-09-26 23:27:07.454646: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-09-26 23:27:07.478170: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-09-26 23:27:07.478209: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-09-26 23:27:07.478229: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-09-26 23:27:07.483495: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-09-26 23:27:07.483705: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-09-26 23:27:08.160315: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/venv/lib/python3.11/site-packages/tensorflow_probability/__init__.py", line 20, in <module>
    from tensorflow_probability import substrates
  File "/venv/lib/python3.11/site-packages/tensorflow_probability/substrates/__init__.py", line 17, in <module>
    from tensorflow_probability.python.internal import all_util
  File "/venv/lib/python3.11/site-packages/tensorflow_probability/python/__init__.py", line 138, in <module>
    dir(globals()[pkg_name])  # Forces loading the package from its lazy loader.
    ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/tensorflow_probability/python/internal/lazy_loader.py", line 57, in __dir__
    module = self._load()
             ^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/tensorflow_probability/python/internal/lazy_loader.py", line 40, in _load
    module = importlib.import_module(self.__name__)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/tensorflow_probability/python/experimental/__init__.py", line 31, in <module>
    from tensorflow_probability.python.experimental import bayesopt
  File "/venv/lib/python3.11/site-packages/tensorflow_probability/python/experimental/bayesopt/__init__.py", line 17, in <module>
    from tensorflow_probability.python.experimental.bayesopt import acquisition
  File "/venv/lib/python3.11/site-packages/tensorflow_probability/python/experimental/bayesopt/acquisition/__init__.py", line 17, in <module>
    from tensorflow_probability.python.experimental.bayesopt.acquisition.acquisition_function import AcquisitionFunction
  File "/venv/lib/python3.11/site-packages/tensorflow_probability/python/experimental/bayesopt/acquisition/acquisition_function.py", line 22, in <module>
    from tensorflow_probability.python.internal import prefer_static as ps
  File "/venv/lib/python3.11/site-packages/tensorflow_probability/python/internal/prefer_static.py", line 361, in <module>
    ones_like = _copy_docstring(tf.ones_like, _ones_like)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/tensorflow_probability/python/internal/prefer_static.py", line 84, in _copy_docstring
    raise ValueError(
ValueError: Arg specs do not match: original=FullArgSpec(args=['input', 'dtype', 'name', 'layout'], varargs=None, varkw=None, defaults=(None, None, None), kwonlyargs=[], kwonlydefaults=None, annotations={}), new=FullArgSpec(args=['input', 'dtype', 'name'], varargs=None, varkw=None, defaults=(None, None), kwonlyargs=[], kwonlydefaults=None, annotations={}), fn=<function ones_like_v2 at 0x7f3c25d2dda0>
(venv) root@c4673c623033:/#
matthewfeickert commented 1 year ago

cc @jburnim, as it seems from the outside of Google world that you're mangaing tfp releases(?). Apologies in advance for the noise if not, and thanks for the work that you've done regardless.

matthewfeickert commented 1 year ago

So

https://github.com/tensorflow/probability/blob/6cc612f05992f079259f71195d34c22d4fc85262/tensorflow_probability/python/internal/prefer_static.py#L79-L86

with

https://github.com/tensorflow/probability/blob/6cc612f05992f079259f71195d34c22d4fc85262/tensorflow_probability/python/internal/prefer_static.py#L30

is breaking at

https://github.com/tensorflow/probability/blob/6cc612f05992f079259f71195d34c22d4fc85262/tensorflow_probability/python/internal/prefer_static.py#L369

where

https://github.com/tensorflow/probability/blob/6cc612f05992f079259f71195d34c22d4fc85262/tensorflow_probability/python/internal/prefer_static.py#L360-L366

as in tensorflow v2.14.0

tf.ones, tf.zeros, tf.fill, tf.ones_like, tf.zeros_like now take an additional Layout argument that controls the output layout of their results.

c.f. https://github.com/tensorflow/tensorflow/blob/v2.14.0/tensorflow/python/ops/array_ops.py#L3107-L3155

To make this more explicit, if you put a breakpoint() immediatley after

https://github.com/tensorflow/probability/blob/6cc612f05992f079259f71195d34c22d4fc85262/tensorflow_probability/python/internal/prefer_static.py#L79-L82

you get

(venv) root@db7aec05ef72:/# python -c 'import tensorflow; import tensorflow_probability'
2023-09-27 01:44:50.273853: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-09-27 01:44:50.275376: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-09-27 01:44:50.298863: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-09-27 01:44:50.298894: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-09-27 01:44:50.298916: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-09-27 01:44:50.305111: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-09-27 01:44:50.305323: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-09-27 01:44:50.996801: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
> /venv/lib/python3.11/site-packages/tensorflow_probability/python/internal/prefer_static.py(84)_copy_docstring()
-> if original_spec != new_spec:
(Pdb) original_spec
FullArgSpec(args=['input', 'dtype', 'name', 'layout'], varargs=None, varkw=None, defaults=(None, None, None), kwonlyargs=[], kwonlydefaults=None, annotations={})
(Pdb) new_spec
FullArgSpec(args=['input', 'dtype', 'name'], varargs=None, varkw=None, defaults=(None, None), kwonlyargs=[], kwonlydefaults=None, annotations={})
(Pdb)
matthewfeickert commented 1 year ago

Oh, comparing TensorFlow Probability v0.21.0 _ones_like

https://github.com/tensorflow/probability/blob/6d5fb1672a1186b47c4ec2c73476ac80c50457b6/tensorflow_probability/python/internal/prefer_static.py#L355-L361

to now

https://github.com/tensorflow/probability/blob/6cc612f05992f079259f71195d34c22d4fc85262/tensorflow_probability/python/internal/prefer_static.py#L360-L366

this was already taken care of by @rainwoodman in 2cbb82d0ed83078e6232020242799cde5cc41ce9. So this is already fixed, but a release is needed with the fix. :+1:

@jburnim, can you comment on when a release might be possible?

ColCarroll commented 1 year ago

Thanks for pinging on this, @matthewfeickert ! Note that using tfp-nightly in the meantime should work.

csuter commented 1 year ago

Right -- stable (ie non-nightly) TFP releases are generally tied to a particular stable TF release and won't generally work with a subsequent TF release. TFP nightlies are tested against tf-nightly and more likely to work with a recent TF stable release. We usually get a TFP release out within a week or two of a new TF version.

matthewfeickert commented 1 year ago

Note that using tfp-nightly in the meantime should work.

Indeed, I checked this after I noticed that the problem was already fixed given 2cbb82d0ed83078e6232020242799cde5cc41ce9.

TFP nightlies are tested against tf-nightly and more likely to work with a recent TF stable release. We usually get a TFP release out within a week or two of a new TF version.

Yes, I'm aware. I had assumed that the TensorFlow team would coordinate releases with the TFP team, as having a high probability of breaking all users is both generally bad and seems to further invalidate the view of the TensorFlow ecosystem being library-like.

I assume given the team's current schedule and other responsibilities O(weeks) can't be improved upon to O(days)?

For future reference, I've been aware since tensorflow v2.14.0rc0 this release of TF would break TFP given nightly testing against rcs (here the *-nightly wheels are obviously not useful). In the future should I open issues asking for a release to be prepared as soon as I see this and verify that a fix has already been committed? Or is the TFP team aware of this already internally through tests, and releases aren't intended to be coordinated, and this issue would just be noise?

csuter commented 1 year ago

Thanks, Matthew. I recognize and appreciate that you're a long-time user and contributor.

TF and TFP are maintained by quite separate groups; there is not very much explicit coordination, although we are proactively notified of upcoming releases as they happen.

TFP stable versions are tested and supported for the TF release that is current when they are built; strictly speaking there was no breakage here, because TFP 0.21 was never explicitly intended to work with TF 2.14. If a TFP version survives a TF release (rare) then that's a happy accident! Otherwise the fact that the nightlies are developed and tested more-or-less in lockstep ensures that there will be some releasable state of TFP near HEAD when a new TF drops. Our release process is essentially about finding that commit, branching, patching up any minor issues, generating release notes and pushing a pypi package. This usually takes a week or two for someone on the team to find time to do.

So from our perspective the current setup is "working as intended" (which is not to say it's the best it could be or that we could imagine -- but dev hours aren't free). I don't think filing bugs like this sooner would expedite that process.

Hope this extra detail provides some useful (if somewhat unsatisfying) context.

rivershah commented 1 year ago

@csuter Could tfp specify compatible tensorflow version ranges to prevent such issues. Unit tests inside my container caught this issue upfront, however it would be good if pip install would just raise an error. Thank you

jburnim commented 1 year ago

NOTE: TFP 0.22.0 has been released.