tensorflow / probability

Probabilistic reasoning and statistical analysis in TensorFlow
https://www.tensorflow.org/probability/
Apache License 2.0
4.24k stars 1.09k forks source link

Incompatibility with cloudpickle==1.5.0 #991

Closed ltetrel closed 4 years ago

ltetrel commented 4 years ago

Hi all,

Due to a new update, it is not possible to import tensorflow_probability anymore. Using cloudpickle <= 1.4.1 fixed the issue

>>> import tensorflow_probability as tfp
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_probability/__init__.py", line 76, in <module>
    from tensorflow_probability.python import *  # pylint: disable=wildcard-import
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_probability/python/__init__.py", line 23, in <module>
    from tensorflow_probability.python import distributions
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_probability/python/distributions/__init__.py", line 88, in <module>
    from tensorflow_probability.python.distributions.pixel_cnn import PixelCNN
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_probability/python/distributions/pixel_cnn.py", line 37, in <module>
    from tensorflow_probability.python.layers import weight_norm
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_probability/python/layers/__init__.py", line 31, in <module>
    from tensorflow_probability.python.layers.distribution_layer import CategoricalMixtureOfOneHotCategorical
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_probability/python/layers/distribution_layer.py", line 28, in <module>
    from cloudpickle.cloudpickle import CloudPickler
ImportError: cannot import name 'CloudPickler'
mli commented 4 years ago

+1 got the same issue

terrytangyuan commented 4 years ago

@jburnim Is https://github.com/tensorflow/probability/commit/5cc832b9d28ed5562961385a3b30ad242d76aac1 a temporary workaround for this or some other issues? Is there any particular reason to pin this specific version of cloudpickle?

matthewfeickert commented 4 years ago

Hi. I opened up an Issue on cloudpickle for this, but we're also observing this in pyhf for the same reasons.

jburnim commented 4 years ago

Thank you for the report, @ltetrel !

It looks like we pinned the CloudPickle dependency to 1.3 in 5cc832b9d28ed5562961385a3b30ad242d76aac1 because of compatibility issues with CloudPickle and some versions of Python 3.5 . (It also looks like CloudPickle has since fixed these issues in https://github.com/cloudpipe/cloudpickle/pull/359 and https://github.com/cloudpipe/cloudpickle/pull/361 .)

As a quick fix, we are considering a TFP 0.10.1 release that is just TFP 0.10.0 but requiring CloudPickle 1.3. (If this would cause problems for you -- e.g., you're using TFP 0.10 and a higher version of CloudPickle -- please comment on this issue to let us know.)

We are also investigating further and taking a look at the fix https://github.com/tensorflow/probability/pull/993 from @matthewfeickert .

terrytangyuan commented 4 years ago

@jburnim Sounds great. We have no issue requiring that specific version. Thanks for the prompt response!

matthewfeickert commented 4 years ago

As a quick fix, we are considering a TFP 0.10.1 release that is just TFP 0.10.0 but requiring CloudPickle 1.3

For the library I work on (pyhf) this would work for the near term. cc @lukasheinrich @kratsg

Though if is possible to have releases that don't explicitly pin dependencies to a single version number I think that's nicer.

We are also investigating further and taking a look at the fix #993

Cool. Let me know if there is anything you need me to iterate on. I haven't taken the time to debug what the one test that is failing in CI is due to (given that all of CI fails at the moment).

emilyfertig commented 4 years ago

Update: We've now released TFP 0.10.1, which pins the CloudPickle version to 1.3, and are still looking into #993 .

matthewfeickert commented 4 years ago

Thank you for fixing this @jburnim! :bow:

StevenSong commented 3 years ago

this is an issue again with the latest version of tensorflow 2.3.1 (security patch from 4 days ago) which has cloudpickle dependency >= 1.5.0 - meanwhile tensorflow-probability 0.11.0 still has cloudpickle == 1.3.0

matthewfeickert commented 3 years ago

this is an issue again with the latest version of tensorflow 2.3.1 (security patch from 4 days ago) which has cloudpickle dependency >= 1.5.0 - meanwhile tensorflow-probability 0.11.0 still has cloudpickle == 1.3.0

The good news is that this was already resolved in TFP https://github.com/tensorflow/probability/commit/7601ef6d6aee014a5566937858974177e4f53122. So the next TFP release should have this all taken care of. I'm not sure what the release schedule for TFP is though.

csuter commented 3 years ago

@matthewfeickert we generally build a new stable release whenever TF does, since in general we end up depending on new TF features in between their (and hence our) stable releases. We could increase our (TFP's) release cadence, so long as we either a) don't have such deps on not-yet-in-stable-TF features, or b) can easily hack around such issues on the release branch.

matthewfeickert commented 3 years ago

Thanks for that info @csuter. :+1: I wasn't meaning to complain about not knowing (it isn't super important to me and I know full well that having people ask about release schedules can be a tiresome discussion point), but I do appreciate you offering up this information here as that probably helps people (though I'm sure had I searched harder I would have already found this information in another Issue).

csuter commented 3 years ago

Definitely didn't detect any complaint! And I could talk release schedules all day! 😁

I think we (TFP) could probably do a bit better at communicating these processes, so folks don't have to go digging in Issues to find the info. Then again Google is a really good search engine, so maybe it's fine to have these bits buried in here 😅

csuter commented 3 years ago

Quick update (h/t to @brianwa84 for pointing out to me the actual context here, which I overlooked) -- TFP should actually release a patch to go with the TF 2.3.1 patch here. We'll look into it ASAP.

csuter commented 3 years ago

TFP 0.11.1 is up on pypi now, and should work fine with TF 2.3.1 and newer cloudpickles. Please let us know if you run into further issues!

Edvard-D commented 3 years ago

Sorry, not sure if this is the right place to post this, but I'm trying to run a training job on Google Cloud AI Platform and this error is being thrown when specifying Python v3.7 and TensorFlow v2.3.1 when setting up the training job.

Traceback (most recent call last): File "/opt/conda/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/opt/conda/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/root/.local/lib/python3.7/site-packages/trader_trainer/training/trader.py", line 2, in from trader_trainer.training import trainers File "/root/.local/lib/python3.7/site-packages/trader_trainer/training/trainers.py", line 7, in from trader_trainer.shared.predictors import ActorCriticTimeSeriesPredictor File "/root/.local/lib/python3.7/site-packages/trader_trainer/shared/predictors.py", line 3, in import tensorflow_probability as tfp File "/opt/conda/lib/python3.7/site-packages/tensorflow_probability/init.py", line 77, in from tensorflow_probability.python import * # pylint: disable=wildcard-import File "/opt/conda/lib/python3.7/site-packages/tensorflow_probability/python/init.py", line 23, in from tensorflow_probability.python import distributions File "/opt/conda/lib/python3.7/site-packages/tensorflow_probability/python/distributions/init.py", line 94, in from tensorflow_probability.python.distributions.pixel_cnn import PixelCNN File "/opt/conda/lib/python3.7/site-packages/tensorflow_probability/python/distributions/pixel_cnn.py", line 37, in from tensorflow_probability.python.layers import weight_norm File "/opt/conda/lib/python3.7/site-packages/tensorflow_probability/python/layers/init.py", line 31, in from tensorflow_probability.python.layers.distribution_layer import CategoricalMixtureOfOneHotCategorical File "/opt/conda/lib/python3.7/site-packages/tensorflow_probability/python/layers/distribution_layer.py", line 28, in from cloudpickle.cloudpickle import CloudPickler ImportError: cannot import name 'CloudPickler' from 'cloudpickle.cloudpickle' (/opt/conda/lib/python3.7/site-packages/cloudpickle/cloudpickle.py)

jimzer commented 3 years ago

We encounter the same problem on the Google Cloud AI Platform, deploying a TFX pipeline with a model using TensorFlow probability

Traceback (most recent call last): File "/opt/conda/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/opt/conda/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/opt/conda/lib/python3.7/site-packages/tfx/orchestration/kubeflow/container_entrypoint.py", line 364, in main() File "/opt/conda/lib/python3.7/site-packages/tfx/orchestration/kubeflow/container_entrypoint.py", line 357, in main execution_info = launcher.launch() File "/opt/conda/lib/python3.7/site-packages/tfx/orchestration/launcher/base_component_launcher.py", line 209, in launch copy.deepcopy(execution_decision.exec_properties)) File "/opt/conda/lib/python3.7/site-packages/tfx/orchestration/launcher/in_process_component_launcher.py", line 72, in _run_executor copy.deepcopy(input_dict), output_dict, copy.deepcopy(exec_properties)) File "/opt/conda/lib/python3.7/site-packages/tfx/components/trainer/executor.py", line 182, in Do run_fn = udf_utils.get_fn(exec_properties, 'run_fn') File "/opt/conda/lib/python3.7/site-packages/tfx/components/util/udf_utils.py", line 49, in get_fn exec_properties[_MODULE_FILE_KEY], fn_name) File "/opt/conda/lib/python3.7/site-packages/tfx/utils/import_utils.py", line 127, in import_func_from_source loader.exec_module(module) File "", line 728, in exec_module File "", line 219, in _call_with_frames_removed File "./censeo/models/vae_trainer.py", line 10, in import tensorflow_probability as tfp File "/opt/conda/lib/python3.7/site-packages/tensorflow_probability/init.py", line 77, in from tensorflow_probability.python import * # pylint: disable=wildcard-import File "/opt/conda/lib/python3.7/site-packages/tensorflow_probability/python/init.py", line 23, in from tensorflow_probability.python import distributions File "/opt/conda/lib/python3.7/site-packages/tensorflow_probability/python/distributions/init.py", line 94, in from tensorflow_probability.python.distributions.pixel_cnn import PixelCNN File "/opt/conda/lib/python3.7/site-packages/tensorflow_probability/python/distributions/pixel_cnn.py", line 37, in from tensorflow_probability.python.layers import weight_norm File "/opt/conda/lib/python3.7/site-packages/tensorflow_probability/python/layers/init.py", line 31, in from tensorflow_probability.python.layers.distribution_layer import CategoricalMixtureOfOneHotCategorical File "/opt/conda/lib/python3.7/site-packages/tensorflow_probability/python/layers/distribution_layer.py", line 28, in from cloudpickle.cloudpickle import CloudPickler ImportError: cannot import name 'CloudPickler' from 'cloudpickle.cloudpickle' (/opt/conda/lib/python3.7/site-packages/cloudpickle/cloudpickle.py)

Edvard-D commented 3 years ago

We encounter the same problem on the Google Cloud AI Platform, deploying a TFX pipeline with a model using TensorFlow probability

I resolved this by forcing AI Platform to update tensorflow_probability by adding it as one of the required packages in the setup file using: 'tensorflow_probability>=0.11.1'. It seems that AI Platform is using an out of date version of tensorflow_probability. Definitely needs to be fixed on Google's end, but at least there's a work around.

entrpn commented 3 years ago

@jimzer @Edvard-D can you guys pls share your configuration.

I am having the same issue using TFX pipeline with a model using tensorflow probability in gcp ai platform. But I can't figure out how to set the runtime version correctly. If i just add the runtime version, I get:

'description': "The specified runtime version '2.4' with the Python version '' is not supported or is deprecated. Please specify a different runtime version.

args:

_ai_platform_training_args = {
    'project': PROJECT_ID,
    'region': GCP_REGION,
    'runtime-version': 2.4
}

Then I try adding python and I get:

'description': 'Only one of runtime version or the master Docker image URI should be provided.'}]}]">

args:

_ai_platform_training_args = {
    'project': PROJECT_ID,
    'region': GCP_REGION,
    'runtimeVersion': '2.4',
    'pythonVersion': '3.7'
}

Thank you.

Edvard-D commented 3 years ago

@entrpn runtimeVersion refers to the Tensorflow version. I'm not sure about the error you're getting, but I'm submitting it using the following code which essentially submits it via command line:

command_arguments = \
[
    'gcloud', 'ai-platform', 'jobs', 'submit', 'training', TRAINING_NAME,
    '--scale-tier', 'custom',
    '--master-machine-type', CPU_TYPE,
    '--job-dir', JOB_DIRECTORY,
    '--package-path', PACKAGE_PATH,
    '--module-name', MODULE_NAME,
    '--region', 'us-central1',
    '--runtime-version', '2.4',
    '--python-version', '3.7'
]
subprocess.Popen(command_arguments, shell=True)

(if you're using Linux I'm pretty sure you should remove "shell=True")

entrpn commented 3 years ago

@Edvard-D thank you for the quick reply. I'm using tfx as the framework to launch the training job and unfortunately I can't pass parameters like you do. Hopefully @jimzer has a working example with tfx.

entrpn commented 3 years ago

I finally solved my issue. Had to dig through the tfx code to figure out how the Trainer component works. If anyone comes across this and is struggling, hopefully it will help.

The issue with tfx and ai platform is that you can't specify runtime version or python version because tfx uses containers. So the way to go about this is to first create a container that uses a tfx image as base. In my case, I needed tensorflow probability so:

FROM gcr.io/tfx-oss-public/tfx:0.30.0
RUN pip install tensorflow-probability==0.12.2

build it and push it to your projects container registry. Then add it to the trainer args like:

_ai_platform_training_args = {
    'project': PROJECT_ID,
    'region': GCP_REGION,
    'masterConfig' : {'imageUri': 'gcr.io/my_project/tfp_trainer:latest'}
}

Also copy the trainer python file that is used by the Trainer component into a gcs bucket so that the image can access it. Then run the job.