ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
33.11k stars 5.6k forks source link

[RLlib] make policy evaluation support Attention nets #27909

Open antoine-galataud opened 2 years ago

antoine-galataud commented 2 years ago

What happened + What you expected to happen

Script rllib/evaluate.py fails when running evaluation loop for an agent trained with provided Attention nets. Problem is that policy initial state is an empty array.

Following exception occurs:

2022-08-16 17:31:22,008 ERROR tf_run_builder.py:50 -- Error fetching: [<tf.Tensor 'default_policy/cond_1/Merge:0' shape=(?,) dtype=int64>, <tf.Tensor 'default_policy/default_model/model_2/gtrxl/Reshape_2:0' shape=(?, 32) dtype=float32>, {'action_prob': <tf.Tensor 'default_policy/Exp:0' shape=(?,) dtype=float32>, 'action_logp': <tf.Tensor 'default_policy/cond_2/Merge:0' shape=(?,) dtype=float32>, 'action_dist_inputs': <tf.Tensor 'default_policy/default_model/model_2/dense_6/BiasAdd:0' shape=(?, 2) dtype=float32>, 'vf_preds': <tf.Tensor 'default_policy/default_model/Reshape:0' shape=(?,) dtype=float32>}], feed_dict={<tf.Tensor 'default_policy/obs:0' shape=(?, 2) dtype=float32>: array([[0., 1.]], dtype=float32), <tf.Tensor 'default_policy/prev_actions:0' shape=(?,) dtype=int64>: array([1]), <tf.Tensor 'default_policy/prev_rewards:0' shape=(?,) dtype=float32>: array([0.]), <tf.Tensor 'default_policy/is_training:0' shape=() dtype=bool>: False, <tf.Tensor 'default_policy/is_exploring:0' shape=() dtype=bool>: True, <tf.Tensor 'default_policy/timestep:0' shape=() dtype=int64>: 108000}
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1375, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1359, in _run_fn
    return self._call_tf_sessionrun(options, feed_dict, fetch_list,
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1451, in _call_tf_sessionrun
    return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'default_policy/state_in_0' with dtype float and shape [?,?,32]
     [[{{node default_policy/state_in_0}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/antoine/.local/lib/python3.8/site-packages/ray/rllib/utils/tf_run_builder.py", line 42, in get
    self._executed = run_timeline(
  File "/home/antoine/.local/lib/python3.8/site-packages/ray/rllib/utils/tf_run_builder.py", line 102, in run_timeline
    fetches = sess.run(ops, feed_dict=feed_dict)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 967, in run
    result = self._run(None, fetches, feed_dict, options_ptr,
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1190, in _run
    results = self._do_run(handle, final_targets, final_fetches,
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1368, in _do_run
    return self._do_call(_run_fn, feeds, fetches, targets, options,
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1394, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'default_policy/state_in_0' with dtype float and shape [?,?,32]
     [[node default_policy/state_in_0 (defined at home/antoine/.local/lib/python3.8/site-packages/ray/rllib/utils/tf_utils.py:204) ]]

Original stack trace for 'default_policy/state_in_0':
  File "snap/pycharm-community/293/plugins/python-ce/helpers/pydev/pydevd.py", line 2195, in <module>
    main()
  File "snap/pycharm-community/293/plugins/python-ce/helpers/pydev/pydevd.py", line 2177, in main
    globals = debugger.run(setup['file'], None, None, is_module)
  File "snap/pycharm-community/293/plugins/python-ce/helpers/pydev/pydevd.py", line 1489, in run
    return self._exec(is_module, entry_point_fn, module_name, file, globals, locals)
  File "snap/pycharm-community/293/plugins/python-ce/helpers/pydev/pydevd.py", line 1496, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "snap/pycharm-community/293/plugins/python-ce/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "home/antoine/.local/lib/python3.8/site-packages/ray/rllib/examples/attention_net.py", line 228, in <module>
    cls = get_trainer_class(args.run)
  File "home/antoine/.local/lib/python3.8/site-packages/ray/rllib/agents/trainer.py", line 870, in __init__
    super().__init__(
  File "home/antoine/.local/lib/python3.8/site-packages/ray/tune/trainable.py", line 156, in __init__
    self.setup(copy.deepcopy(self.config))
  File "home/antoine/.local/lib/python3.8/site-packages/ray/rllib/agents/trainer.py", line 950, in setup
    self.workers = WorkerSet(
  File "home/antoine/.local/lib/python3.8/site-packages/ray/rllib/evaluation/worker_set.py", line 170, in __init__
    self._local_worker = self._make_worker(
  File "home/antoine/.local/lib/python3.8/site-packages/ray/rllib/evaluation/worker_set.py", line 630, in _make_worker
    worker = cls(
  File "home/antoine/.local/lib/python3.8/site-packages/ray/rllib/evaluation/rollout_worker.py", line 630, in __init__
    self._build_policy_map(
  File "home/antoine/.local/lib/python3.8/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1788, in _build_policy_map
    self.policy_map.create_policy(
  File "home/antoine/.local/lib/python3.8/site-packages/ray/rllib/policy/policy_map.py", line 140, in create_policy
    self[policy_id] = class_(
  File "home/antoine/.local/lib/python3.8/site-packages/ray/rllib/policy/tf_policy_template.py", line 256, in __init__
    DynamicTFPolicy.__init__(
  File "home/antoine/.local/lib/python3.8/site-packages/ray/rllib/policy/dynamic_tf_policy.py", line 226, in __init__
    self._state_inputs = [
  File "home/antoine/.local/lib/python3.8/site-packages/ray/rllib/policy/dynamic_tf_policy.py", line 227, in <listcomp>
    get_placeholder(
  File "home/antoine/.local/lib/python3.8/site-packages/ray/rllib/utils/tf_utils.py", line 204, in get_placeholder
    return tf1.placeholder(
  File "usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/array_ops.py", line 3179, in placeholder
    return gen_array_ops.placeholder(dtype=dtype, shape=shape, name=name)
  File "usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 6725, in placeholder
    _, _, _op, _outputs = _op_def_library._apply_op_helper(
  File "usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/op_def_library.py", line 748, in _apply_op_helper
    op = g._create_op_internal(op_type_name, inputs, dtypes=None,
  File "usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/ops.py", line 3528, in _create_op_internal
    ret = Operation(
  File "usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/ops.py", line 1990, in __init__
    self._traceback = tf_stack.extract_stack()

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1375, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1359, in _run_fn
    return self._call_tf_sessionrun(options, feed_dict, fetch_list,
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1451, in _call_tf_sessionrun
    return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'default_policy/state_in_0' with dtype float and shape [?,?,32]
     [[{{node default_policy/state_in_0}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/antoine/.local/lib/python3.8/site-packages/ray/rllib/utils/tf_run_builder.py", line 55, in get
    raise e
  File "/home/antoine/.local/lib/python3.8/site-packages/ray/rllib/utils/tf_run_builder.py", line 42, in get
    self._executed = run_timeline(
  File "/home/antoine/.local/lib/python3.8/site-packages/ray/rllib/utils/tf_run_builder.py", line 102, in run_timeline
    fetches = sess.run(ops, feed_dict=feed_dict)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 967, in run
    result = self._run(None, fetches, feed_dict, options_ptr,
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1190, in _run
    results = self._do_run(handle, final_targets, final_fetches,
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1368, in _do_run
    return self._do_call(_run_fn, feeds, fetches, targets, options,
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1394, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'default_policy/state_in_0' with dtype float and shape [?,?,32]
     [[node default_policy/state_in_0 (defined at home/antoine/.local/lib/python3.8/site-packages/ray/rllib/utils/tf_utils.py:204) ]]

Original stack trace for 'default_policy/state_in_0':
  File "snap/pycharm-community/293/plugins/python-ce/helpers/pydev/pydevd.py", line 2195, in <module>
    main()
  File "snap/pycharm-community/293/plugins/python-ce/helpers/pydev/pydevd.py", line 2177, in main
    globals = debugger.run(setup['file'], None, None, is_module)
  File "snap/pycharm-community/293/plugins/python-ce/helpers/pydev/pydevd.py", line 1489, in run
    return self._exec(is_module, entry_point_fn, module_name, file, globals, locals)
  File "snap/pycharm-community/293/plugins/python-ce/helpers/pydev/pydevd.py", line 1496, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "snap/pycharm-community/293/plugins/python-ce/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "home/antoine/.local/lib/python3.8/site-packages/ray/rllib/examples/attention_net.py", line 228, in <module>
    cls = get_trainer_class(args.run)
  File "home/antoine/.local/lib/python3.8/site-packages/ray/rllib/agents/trainer.py", line 870, in __init__
    super().__init__(
  File "home/antoine/.local/lib/python3.8/site-packages/ray/tune/trainable.py", line 156, in __init__
    self.setup(copy.deepcopy(self.config))
  File "home/antoine/.local/lib/python3.8/site-packages/ray/rllib/agents/trainer.py", line 950, in setup
    self.workers = WorkerSet(
  File "home/antoine/.local/lib/python3.8/site-packages/ray/rllib/evaluation/worker_set.py", line 170, in __init__
    self._local_worker = self._make_worker(
  File "home/antoine/.local/lib/python3.8/site-packages/ray/rllib/evaluation/worker_set.py", line 630, in _make_worker
    worker = cls(
  File "home/antoine/.local/lib/python3.8/site-packages/ray/rllib/evaluation/rollout_worker.py", line 630, in __init__
    self._build_policy_map(
  File "home/antoine/.local/lib/python3.8/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1788, in _build_policy_map
    self.policy_map.create_policy(
  File "home/antoine/.local/lib/python3.8/site-packages/ray/rllib/policy/policy_map.py", line 140, in create_policy
    self[policy_id] = class_(
  File "home/antoine/.local/lib/python3.8/site-packages/ray/rllib/policy/tf_policy_template.py", line 256, in __init__
    DynamicTFPolicy.__init__(
  File "home/antoine/.local/lib/python3.8/site-packages/ray/rllib/policy/dynamic_tf_policy.py", line 226, in __init__
    self._state_inputs = [
  File "home/antoine/.local/lib/python3.8/site-packages/ray/rllib/policy/dynamic_tf_policy.py", line 227, in <listcomp>
    get_placeholder(
  File "home/antoine/.local/lib/python3.8/site-packages/ray/rllib/utils/tf_utils.py", line 204, in get_placeholder
    return tf1.placeholder(
  File "usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/array_ops.py", line 3179, in placeholder
    return gen_array_ops.placeholder(dtype=dtype, shape=shape, name=name)
  File "usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 6725, in placeholder
    _, _, _op, _outputs = _op_def_library._apply_op_helper(
  File "usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/op_def_library.py", line 748, in _apply_op_helper
    op = g._create_op_internal(op_type_name, inputs, dtypes=None,
  File "usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/ops.py", line 3528, in _create_op_internal
    ret = Operation(
  File "usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/ops.py", line 1990, in __init__
    self._traceback = tf_stack.extract_stack()

Versions / Dependencies

Ray 1.13.0

Reproduction script

Take rllib/examples/attention_net.py then change running with Tune by:

results = tune.run(args.run, config=config, stop=stop, verbose=2,
                   checkpoint_at_end=True, metric="episode_reward_mean", mode="max")

from ray.rllib.agents.registry import get_trainer_class
from ray.rllib.evaluate import rollout

config["create_env_on_driver"] = True
cls = get_trainer_class(args.run)
agent = cls(env=args.env, config=config)
agent.restore(results.best_checkpoint)
rollout(agent=agent, env_name=None, num_steps=100, no_render=True)

Issue Severity

Medium: It is a significant difficulty but I can work around it.

RocketRider commented 1 year ago

I am having a similar issue. Is there any progress on this? What is your workaround?

antoine-galataud commented 1 year ago

@RocketRider you can find an example here: https://github.com/ray-project/ray/blob/master/rllib/examples/inference_and_serving/policy_inference_after_training_with_attention.py

There's a small update to make if you're using attention_use_n_prev_actions or attention_use_n_prev_rewards > 0, at https://github.com/ray-project/ray/blob/98a446bb97575e0960186c2035e555b7d4a5823d/rllib/examples/inference_and_serving/policy_inference_after_training_with_attention.py#L187-L190

instead there should be something like:

if init_prev_a is not None:
    prev_a = np.concatenate([prev_a, [action]], axis=0)[1:]
if init_prev_r is not None:
    prev_r = np.concatenate([prev_r, [reward]], axis=0)[1:]

Also this example works for discrete action space, if you have multidiscrete you'll have to initialize this way:

init_prev_a = prev_a = np.array(
    [[0] * env.action_space.nvec.shape[0]] * prev_n_actions,
    dtype=np.int32
)