tensorflow / agents

TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.
Apache License 2.0
2.8k stars 721 forks source link

Q-Network wrong output spec #896

Open rissois opened 12 months ago

rissois commented 12 months ago

I am receiving the following error: Expected q_network to emit a floating point tensor with inner dims (464,); but saw network output spec: TensorSpec(shape=(6, 4, 464), dtype=tf.float32, name=None)

I am building a custom environment for DqnAgent with an observation shape of (6,4,4). The action is scalar (I would have liked a (2,), but apparently that's not possible at the moment. I am following this tutorial as closely as I can for my use case.

The environment class is initialized with:

self._action_spec = array_spec.BoundedArraySpec(
    shape=(), dtype=np.int32, minimum=0, maximum=463, name='action'
)

# Six 4x4 boards
self._observation_spec = array_spec.BoundedArraySpec(
    (6, 4, 4), np.int32,
    minimum=self.createMinMaxBoards([0, 0, 0, 0, 0, -1]),
    maximum=self.createMinMaxBoards([1, 1, 1, 1, 3, 2]),
)

I was able to successfully validate the environment and run the environment with a fixed policy, as per the tutorial, so the environment itself is in good shape. I then jumped over to this tutorial to add the agent and copy and pasted those two blocks of code directly:

fc_layer_params = (100, 50)
action_tensor_spec = tensor_spec.from_spec(env.action_spec())
num_actions = action_tensor_spec.maximum - action_tensor_spec.minimum + 1

# Define a helper function to create Dense layers configured with the right
# activation and kernel initializer.
def dense_layer(num_units):
  return tf.keras.layers.Dense(
      num_units,
      activation=tf.keras.activations.relu,
      kernel_initializer=tf.keras.initializers.VarianceScaling(
          scale=2.0, mode='fan_in', distribution='truncated_normal'))

# QNetwork consists of a sequence of Dense layers followed by a dense layer
# with `num_actions` units to generate one q_value per available action as
# its output.
dense_layers = [dense_layer(num_units) for num_units in fc_layer_params]
q_values_layer = tf.keras.layers.Dense(
    num_actions,
    activation=None,
    kernel_initializer=tf.keras.initializers.RandomUniform(
        minval=-0.03, maxval=0.03),
    bias_initializer=tf.keras.initializers.Constant(-0.2))
q_net = sequential.Sequential(dense_layers + [q_values_layer])
optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate)

train_step_counter = tf.Variable(0)

agent = dqn_agent.DqnAgent(
    train_env.time_step_spec(),
    train_env.action_spec(),
    q_network=q_net,
    optimizer=optimizer,
    td_errors_loss_fn=common.element_wise_squared_loss,
    train_step_counter=train_step_counter)

agent.initialize()

The error is thrown at agent = dqn_agent.DqnAgent(...). There is a line in dqn_agent.py: q_network.create_variables(net_observation_spec) which seems to create the (6,4,464) shape. I would have imagined the network output would automatically be adopted from q_values_layer num_actions. More then likely this is a failure on my end, but I have seen unresolved posts on StackOverflow. Can anyone please help correct my understanding / code here?

LokeshNEU747 commented 11 months ago

Even I'm facing the same issue. Have you resolved the issue?