Open antoine-galataud opened 2 years ago
I am having a similar issue. Is there any progress on this? What is your workaround?
@RocketRider you can find an example here: https://github.com/ray-project/ray/blob/master/rllib/examples/inference_and_serving/policy_inference_after_training_with_attention.py
There's a small update to make if you're using attention_use_n_prev_actions
or attention_use_n_prev_rewards
> 0, at
https://github.com/ray-project/ray/blob/98a446bb97575e0960186c2035e555b7d4a5823d/rllib/examples/inference_and_serving/policy_inference_after_training_with_attention.py#L187-L190
instead there should be something like:
if init_prev_a is not None:
prev_a = np.concatenate([prev_a, [action]], axis=0)[1:]
if init_prev_r is not None:
prev_r = np.concatenate([prev_r, [reward]], axis=0)[1:]
Also this example works for discrete action space, if you have multidiscrete you'll have to initialize this way:
init_prev_a = prev_a = np.array(
[[0] * env.action_space.nvec.shape[0]] * prev_n_actions,
dtype=np.int32
)
What happened + What you expected to happen
Script
rllib/evaluate.py
fails when running evaluation loop for an agent trained with provided Attention nets. Problem is that policy initial state is an empty array.Following exception occurs:
Versions / Dependencies
Ray 1.13.0
Reproduction script
Take
rllib/examples/attention_net.py
then change running with Tune by:Issue Severity
Medium: It is a significant difficulty but I can work around it.