[rllib] Unable to restore multiagent PPO policy models with tensorflow

ivallesp commented 3 years ago

What is the problem?

Ray version: 1.0.1 Tensorflow version: 2.3.1 Operative systems tested: Ubuntu 18.04 and MacOS Mojave

Hi, I am trying to export a trained policy in a multiagent environment as a tensorflow model, but it is dropping me an UnliftableError. I tried to simplify the reproduction script as much as possible.

Reproduction (REQUIRED)

from gym.spaces import Discrete

import ray
from ray.rllib.examples.env.rock_paper_scissors import RockPaperScissors
from ray.rllib.agents import ppo

select_policy = lambda agent_id: "policy_01" if agent_id == "player1" else "policy_02"

config = {
    "multiagent": {
        "policies": {
            "policy_01": (None, Discrete(3), Discrete(3), {}),
            "policy_02": (None, Discrete(3), Discrete(3), {}),
        },
        "policy_mapping_fn": select_policy,
    },
}

ray.init()
trainer = ppo.PPOTrainer(env=RockPaperScissors, config=config)
trainer.train()  # Train one step
trainer.export_policy_model("exported_model", "policy_01")

Once the model is saved, try to restore it in tensorflow with the following 2 lines.

import tensorflow as tf
tf.saved_model.load("exported_model")

This drops me the following error:

WARNING:tensorflow:From /Users/ivallesp/projects/rockpaperscisors/.venv/lib/python3.7/site-packages/ray/rllib/policy/tf_policy.py:653: build_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info.
WARNING:tensorflow:Unable to create a python object for variable <tf.Variable 'policy_01/timestep_1:0' shape=() dtype=int64_ref> because it is a reference variable. It may not be visible to training APIs. If this is a problem, consider rebuilding the SavedModel after running tf.compat.v1.enable_resource_variables().
WARNING:tensorflow:Unable to create a python object for variable <tf.Variable 'policy_01/kl_coeff:0' shape=() dtype=float32_ref> because it is a reference variable. It may not be visible to training APIs. If this is a problem, consider rebuilding the SavedModel after running tf.compat.v1.enable_resource_variables().
WARNING:tensorflow:Unable to create a python object for variable <tf.Variable 'policy_01/entropy_coeff:0' shape=() dtype=float32_ref> because it is a reference variable. It may not be visible to training APIs. If this is a problem, consider rebuilding the SavedModel after running tf.compat.v1.enable_resource_variables().
WARNING:tensorflow:Unable to create a python object for variable <tf.Variable 'policy_01/lr:0' shape=() dtype=float32_ref> because it is a reference variable. It may not be visible to training APIs. If this is a problem, consider rebuilding the SavedModel after running tf.compat.v1.enable_resource_variables().
WARNING:tensorflow:Unable to create a python object for variable <tf.Variable 'policy_01/global_step:0' shape=() dtype=int64_ref> because it is a reference variable. It may not be visible to training APIs. If this is a problem, consider rebuilding the SavedModel after running tf.compat.v1.enable_resource_variables().
WARNING:tensorflow:Unable to create a python object for variable <tf.Variable 'policy_01/timestep_1:0' shape=() dtype=int64_ref> because it is a reference variable. It may not be visible to training APIs. If this is a problem, consider rebuilding the SavedModel after running tf.compat.v1.enable_resource_variables().
WARNING:tensorflow:Unable to create a python object for variable <tf.Variable 'policy_01/kl_coeff:0' shape=() dtype=float32_ref> because it is a reference variable. It may not be visible to training APIs. If this is a problem, consider rebuilding the SavedModel after running tf.compat.v1.enable_resource_variables().
WARNING:tensorflow:Unable to create a python object for variable <tf.Variable 'policy_01/entropy_coeff:0' shape=() dtype=float32_ref> because it is a reference variable. It may not be visible to training APIs. If this is a problem, consider rebuilding the SavedModel after running tf.compat.v1.enable_resource_variables().
WARNING:tensorflow:Unable to create a python object for variable <tf.Variable 'policy_01/lr:0' shape=() dtype=float32_ref> because it is a reference variable. It may not be visible to training APIs. If this is a problem, consider rebuilding the SavedModel after running tf.compat.v1.enable_resource_variables().
WARNING:tensorflow:Unable to create a python object for variable <tf.Variable 'policy_01/global_step:0' shape=() dtype=int64_ref> because it is a reference variable. It may not be visible to training APIs. If this is a problem, consider rebuilding the SavedModel after running tf.compat.v1.enable_resource_variables().
WARNING:tensorflow:Some variables could not be lifted out of a loaded function. Run the tf.initializers.tables_initializer() operation to restore these variables.
WARNING:tensorflow:Unable to create a python object for variable <tf.Variable 'policy_01/timestep_1:0' shape=() dtype=int64_ref> because it is a reference variable. It may not be visible to training APIs. If this is a problem, consider rebuilding the SavedModel after running tf.compat.v1.enable_resource_variables().
WARNING:tensorflow:Unable to create a python object for variable <tf.Variable 'policy_01/kl_coeff:0' shape=() dtype=float32_ref> because it is a reference variable. It may not be visible to training APIs. If this is a problem, consider rebuilding the SavedModel after running tf.compat.v1.enable_resource_variables().
WARNING:tensorflow:Unable to create a python object for variable <tf.Variable 'policy_01/entropy_coeff:0' shape=() dtype=float32_ref> because it is a reference variable. It may not be visible to training APIs. If this is a problem, consider rebuilding the SavedModel after running tf.compat.v1.enable_resource_variables().
WARNING:tensorflow:Unable to create a python object for variable <tf.Variable 'policy_01/lr:0' shape=() dtype=float32_ref> because it is a reference variable. It may not be visible to training APIs. If this is a problem, consider rebuilding the SavedModel after running tf.compat.v1.enable_resource_variables().
WARNING:tensorflow:Unable to create a python object for variable <tf.Variable 'policy_01/global_step:0' shape=() dtype=int64_ref> because it is a reference variable. It may not be visible to training APIs. If this is a problem, consider rebuilding the SavedModel after running tf.compat.v1.enable_resource_variables().
Traceback (most recent call last):
  File "minimal.py", line 27, in <module>
    tf.saved_model.load("exported_model")
  File "/Users/ivallesp/projects/rockpaperscisors/.venv/lib/python3.7/site-packages/tensorflow/python/saved_model/load.py", line 603, in load
    return load_internal(export_dir, tags, options)
  File "/Users/ivallesp/projects/rockpaperscisors/.venv/lib/python3.7/site-packages/tensorflow/python/saved_model/load.py", line 649, in load_internal
    root = load_v1_in_v2.load(export_dir, tags)
  File "/Users/ivallesp/projects/rockpaperscisors/.venv/lib/python3.7/site-packages/tensorflow/python/saved_model/load_v1_in_v2.py", line 263, in load
    return loader.load(tags=tags)
  File "/Users/ivallesp/projects/rockpaperscisors/.venv/lib/python3.7/site-packages/tensorflow/python/saved_model/load_v1_in_v2.py", line 246, in load
    signature_functions = self._extract_signatures(wrapped, meta_graph_def)
  File "/Users/ivallesp/projects/rockpaperscisors/.venv/lib/python3.7/site-packages/tensorflow/python/saved_model/load_v1_in_v2.py", line 158, in _extract_signatures
    signature_fn = wrapped.prune(feeds=feeds, fetches=fetches)
  File "/Users/ivallesp/projects/rockpaperscisors/.venv/lib/python3.7/site-packages/tensorflow/python/eager/wrap_function.py", line 338, in prune
    base_graph=self._func_graph)
  File "/Users/ivallesp/projects/rockpaperscisors/.venv/lib/python3.7/site-packages/tensorflow/python/eager/lift_to_graph.py", line 260, in lift_to_graph
    add_sources=add_sources))
  File "/Users/ivallesp/projects/rockpaperscisors/.venv/lib/python3.7/site-packages/tensorflow/python/ops/op_selector.py", line 413, in map_subgraph
    % (repr(init_tensor), repr(op), _path_from(op, init_tensor, sources)))
tensorflow.python.ops.op_selector.UnliftableError: A SavedModel signature needs an input for each placeholder the signature's outputs use. An output for signature 'serving_default' depends on a placeholder which is not an input (i.e. the placeholder is not fed a value).

Unable to lift tensor <tf.Tensor 'policy_01/cond_2/Merge:0' shape=(?,) dtype=float32> because it depends transitively on placeholder <tf.Operation 'policy_01/timestep' type=Placeholder> via at least one path, e.g.: policy_01/cond_2/Merge (Merge) <- policy_01/cond_2/Switch_1 (Switch) <- policy_01/cond_2/pred_id (Identity) <- policy_01/LogicalAnd (LogicalAnd) <- policy_01/GreaterEqual (GreaterEqual) <- policy_01/timestep (Placeholder)

[x] I have verified my script runs in a clean environment and reproduces the issue.
[x] I have verified the issue also occurs with the latest wheels.

ivallesp commented 3 years ago

In case somebody else faces this issue, the workaround is to downgrade to ray===1.0.0. In this version the same warnings are dropped but the UnliftableError disappears.

krfricke commented 3 years ago

Thanks work the issue and workaround @ivallesp. cc @sven1977 have you seen this one?

ivallesp commented 3 years ago

There is an extra problem here. When you call trainer.export_policy_model("exported_model", "policy_01") in the example above, the weights of all the defined policies are stored in the exported_model folder, not only the ones of policy_01 as specified.

I have been reviewing the code and I have seen that the add_meta_graph_and_variables function here adds the variables of all the policies defined in the tf session, although the signature_def_map defined two lines above is correctly built. I don't find any workaround for now.

sven1977 commented 3 years ago

Thanks for filing this @ivallesp! Taking a look rn. I can reproduce the above error.

sven1977 commented 3 years ago

Could you try this fix here? It's working for me. The problem is apparently that for some reason, the timestep placeholder must be one with default value. I'm not understanding fully, why that's the case. The example is further reducable to non multi-agent (e.g. CartPole and no multiagent config), num_workers=0, simple_optimizer=True (I thought it may have had something to do with the multi-GPU optimizer's copying the policies, but it doesn't).

https://github.com/ray-project/ray/pull/12786

@ivallesp

sven1977 commented 3 years ago

Could simply be a tf1.x quirk.

sven1977 commented 3 years ago

Closing this issue. Feel free to re-open if the above solution does not fix the problem on your end. I was also able to make these Unable to create a python object for variable <tf.Variabl... warnings go away. But these were unrelated to the actual error/crash.

ray-project / ray

[rllib] Unable to restore multiagent PPO policy models with tensorflow #12244

What is the problem?

Reproduction (REQUIRED)