tensorflow / agents

TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.
Apache License 2.0
2.79k stars 722 forks source link

actor_network output spec does not match action spec:error when using multi-discrete action space with PPO agent #720

Open sibyjackgrove opened 2 years ago

sibyjackgrove commented 2 years ago

This is similar to #656. But I am making another issue since that issue is still not resolved. Also, @sguada mentioned in #702 that PPO agent can take 1-D action spaces.

I have the following action spec: BoundedArraySpec(shape=(5,), dtype=dtype('int32'), name='action', minimum=0, maximum=1)

I am trying to use it with a PPO agent as shown below.


actor_net = actor_distribution_rnn_network.ActorDistributionRnnNetwork(eval_env.observation_spec(),eval_env.action_spec(),lstm_size=(80,))
value_net = value_rnn_network.ValueRnnNetwork(eval_env.observation_spec())

optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
tf_agent = ppo_clip_agent.PPOClipAgent(
        eval_env.time_step_spec(),
        eval_env.action_spec(),
        optimizer,
        actor_net=actor_net,
        value_net=value_net)
tf_agent.initialize()

However, I keep getting the following error:

ValueError: actor_network output spec does not match action spec:
TensorSpec(shape=(), dtype=tf.int32, name=None)
vs.
BoundedTensorSpec(shape=(5,), dtype=tf.int32, name='action', minimum=array(0, dtype=int32), maximum=array(1, dtype=int32))

Note that actor_distribution_rnn_network.ActorDistributionRnnNetwork when given the action_spec is able to create an output of shape(5,2).

Any suggestion to resolve this would be highly appreciated.

sibyjackgrove commented 2 years ago

@sguada Could you please suggest a solution?