ValueError: tf.enable_eager_execution must be called at program startup

kk-55 commented 3 years ago

What is the problem?

Ray version and other system information (Python version, TensorFlow version, OS):

Ray version 2.0.0.dev0
Python version 3.8.5 64-bit
TF version 2.3.1
OS Ubuntu 20.04 running on Windows Subsystem for Linux (WSL)

Running the multi agent cartpole example, but w/o using tune, I get

ValueError: tf.enable_eager_execution must be called at program startup

This occurs when I manually set up a PPO trainer and choose framework='tf2'. That is, TF2SharedWeightsModel is to be used for "variable sharing" between models/policies. By default, running it w/ tune works. What is the problem causing this value error?

Reproduction (REQUIRED)

Mulit agent cartpole example (w/o tune but w/ framework='tf2' and trainer=PPOTrainer(config=config), result=trainer.train()):

"""Simple example of setting up a multi-agent policy mapping.

Control the number of agents and policies via --num-agents and --num-policies.

This works with hundreds of agents and policies, but note that initializing
many TF policies will take some time.

Also, TF evals might slow down with large numbers of policies. To debug TF
execution, set the TF_TIMELINE_DIR environment variable.
"""

import argparse
import gym
import os
import random
from ray.rllib.agents.ppo import PPOTrainer
import ray
from ray import tune
from ray.rllib.examples.env.multi_agent import MultiAgentCartPole
from ray.rllib.examples.models.shared_weights_model import \
    SharedWeightsModel1, SharedWeightsModel2, TF2SharedWeightsModel, \
    TorchSharedWeightsModel
from ray.rllib.models import ModelCatalog
from ray.rllib.utils.framework import try_import_tf
from ray.rllib.utils.test_utils import check_learning_achieved

tf1, tf, tfv = try_import_tf()

parser = argparse.ArgumentParser()

parser.add_argument("--num-agents", type=int, default=4)
parser.add_argument("--num-policies", type=int, default=2)
parser.add_argument("--stop-iters", type=int, default=200)
parser.add_argument("--stop-reward", type=float, default=150)
parser.add_argument("--stop-timesteps", type=int, default=100000)
parser.add_argument("--simple", action="store_true")
parser.add_argument("--num-cpus", type=int, default=0)
parser.add_argument("--as-test", action="store_true")
parser.add_argument(
    "--framework", choices=["tf2", "tf", "tfe", "torch"], default="tf")

if __name__ == "__main__":
    args = parser.parse_args()

    ray.init(num_cpus=args.num_cpus or None)
    args.framework = "tf2"
    # Register the models to use.
    if args.framework == "torch":
        mod1 = mod2 = TorchSharedWeightsModel
    elif args.framework in ["tfe", "tf2"]:
        mod1 = mod2 = TF2SharedWeightsModel
    else:
        mod1 = SharedWeightsModel1
        mod2 = SharedWeightsModel2
    ModelCatalog.register_custom_model("model1", mod1)
    ModelCatalog.register_custom_model("model2", mod2)

    # Get obs- and action Spaces.
    single_env = gym.make("CartPole-v0")
    obs_space = single_env.observation_space
    act_space = single_env.action_space

    # Each policy can have a different configuration (including custom model).
    def gen_policy(i):
        config = {
            "model": {
                "custom_model": ["model1", "model2"][i % 2],
            },
            "gamma": random.choice([0.95, 0.99]),
        }
        return (None, obs_space, act_space, config)

    # Setup PPO with an ensemble of `num_policies` different policies.
    policies = {
        "policy_{}".format(i): gen_policy(i)
        for i in range(args.num_policies)
    }
    policy_ids = list(policies.keys())

    config = {
        "env": MultiAgentCartPole,
        "env_config": {
            "num_agents": args.num_agents,
        },
        "simple_optimizer": args.simple,
        # Use GPUs iff `RLLIB_NUM_GPUS` env var set to > 0.
        "num_gpus": int(os.environ.get("RLLIB_NUM_GPUS", "0")),
        "num_sgd_iter": 10,
        "multiagent": {
            "policies": policies,
            "policy_mapping_fn": (lambda agent_id: random.choice(policy_ids)),
        },
        "framework": args.framework,
    }
    stop = {
        "episode_reward_mean": args.stop_reward,
        "timesteps_total": args.stop_timesteps,
        "training_iteration": args.stop_iters,
    }
    trainer = PPOTrainer(config=config)
    results = trainer.train()
    # results = tune.run("PPO", stop=stop, config=config, verbose=1)
    print("End")
    if args.as_test:
        check_learning_achieved(results, args.stop_reward)
    ray.shutdown()

sven1977 commented 3 years ago

Seems tf version related. I can confirm the above for tf==2.4.1, but not for tf==2.0.x

sven1977 commented 3 years ago

Hmm, I'm seeing a couple of bugs on our end. One also has to do with the "simple_optimizer" setting and is unrelated to this issue. Either way, the main problem is the tf version, which does not seem to allow us calling tf.enable_eager_execution() anymore in the "middle".

Quick workaround for now: Could you add this to the very top of your script?

from ray.rllib.utils.framework import try_import_tf
tf1, tf, tfv = try_import_tf()
tf1.enable_eager_execution()

sven1977 commented 3 years ago

Also, could you set the simple ("simple_optimizer" in the RLlib config) arg to True when using tf2? There is a validation bug that allows this to slip through the cracks. tf-eager should always use the "simple_optimizer" option automatically.

sven1977 commented 3 years ago

PR with a fix for the above issue:

https://github.com/ray-project/ray/pull/14737

kk-55 commented 3 years ago

Next monday I will be back at work and check it out.

kk-55 commented 3 years ago

Hmm, I'm seeing a couple of bugs on our end. One also has to do with the "simple_optimizer" setting and is unrelated to this issue. Either way, the main problem is the tf version, which does not seem to allow us calling tf.enable_eager_execution() anymore in the "middle".

Quick workaround for now: Could you add this to the very top of your script?
from ray.rllib.utils.framework import try_import_tf
tf1, tf, tfv = try_import_tf()
tf1.enable_eager_execution()

@sven1977 I can confirm that adding tf1.enable_eager_execution() to the very top of _shared_weightsmodel.py (and also to my custom script I'm really working on) has fixed the error. As far as I can see the simple_optimizer arg has no impact anyway running under my ray/RLlib release (version 2.0.0.dev0).

Btw: Fixing the above bug led to another bug (_ValueError: Attempt to convert a value (RepeatedValues(...)) with an unsupported type (<class 'ray.rllib.models.repeatedvalues.RepeatedValues'>) to a Tensor.). In my use case input['obs'] is a dict also including ray.rllib.models.repeated_values.RepeatedValues and the function _convert_to_tf in _ray.rllib.policy.eager_tfpolicy.py can only handle RepeatedValues in an "outer structure" but not in an "inner structure" like in a dict. As a first workaround I'd fixed it in this way:

x = tf.nest.map_structure(
    lambda f: _convert_to_tf(f, d) if isinstance(f, RepeatedValues) 
        else tf.convert_to_tensor(f, d) if f is not None else None, x)

instead of https://github.com/ray-project/ray/blob/9ccf291f4d1173e86becf48c80cc15b02386dcc8/rllib/policy/eager_tf_policy.py#L41-L42

sven1977 commented 3 years ago

Awesome @kk-55 ! Thanks for the suggested fix for the map_structure problem in tf-eager. Will PR this now.

sven1977 commented 3 years ago

PR: https://github.com/ray-project/ray/pull/15015

HJasperson commented 2 years ago

Popping in to say that this issue still persists. In my case, tf1.enable_eager_execution was being called in evaluation/rollout_worker.py despite using "tf2" as the framework. The workaround (putting tf1.enable_eager_execution() at the top of every file) fixed the issue, but it took a hot minute to find this thread.

Ray version 1.11.0 Python version 3.9.12 TF version 2.7.0

ray-project / ray

ValueError: tf.enable_eager_execution must be called at program startup #14533

What is the problem?

Reproduction (REQUIRED)