tensorflow / agents

TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.
Apache License 2.0
2.8k stars 721 forks source link

Getting extra loss data when training a REINFORCE agent doesn't work for me #536

Closed Eisfoehniks closed 3 years ago

Eisfoehniks commented 3 years ago

Hi,

I'm currently training a reinforce agent using a value network as a baseline. I'm trying to get seperate loss values for the actor network and the value network to evaluate stability seperately. However the tensor which is returned by tf_agent.train(experience).extra seems to contain only the variable names as strings. I attached an example based on the REINFORCE tutorial below.

The example has the following output for me: ReinforceAgentLossInfo(policy_gradient_loss=<tf.Tensor: shape=(), dtype=string, numpy=b'policy_gradient_loss'>, policy_network_regularization_loss=<tf.Tensor: shape=(), dtype=string, numpy=b'policy_network_regularization_loss'>, entropy_regularization_loss=<tf.Tensor: shape=(), dtype=string, numpy=b'entropy_regularization_loss'>, value_estimation_loss=<tf.Tensor: shape=(), dtype=string, numpy=b'value_estimation_loss'>, value_network_regularization_loss=<tf.Tensor: shape=(), dtype=string, numpy=b'value_network_regularization_loss'>)

How do I get actual loss values?

# Example REINFORCE
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import numpy as np

import tensorflow as tf

from tf_agents.agents.reinforce import reinforce_agent
from tf_agents.environments import suite_gym
from tf_agents.environments import tf_py_environment
from tf_agents.metrics import tf_metrics
from tf_agents.networks import actor_distribution_network
from tf_agents.networks import value_network
from tf_agents.replay_buffers import tf_uniform_replay_buffer
from tf_agents.trajectories import trajectory
from tf_agents.utils import common

tf.compat.v1.enable_v2_behavior()

env_name = "CartPole-v0" # @param {type:"string"}
num_iterations = 250 # @param {type:"integer"}
collect_episodes_per_iteration = 2 # @param {type:"integer"}
replay_buffer_capacity = 2000 # @param {type:"integer"}

fc_layer_params = (100,)

learning_rate = 1e-3 # @param {type:"number"}
log_interval = 25 # @param {type:"integer"}
num_eval_episodes = 10 # @param {type:"integer"}
eval_interval = 50 # @param {type:"integer"}

train_py_env = suite_gym.load(env_name)
train_env = tf_py_environment.TFPyEnvironment(train_py_env)

actor_net = actor_distribution_network.ActorDistributionNetwork(
    train_env.observation_spec(),
    train_env.action_spec(),
    fc_layer_params=fc_layer_params)

valueNet = value_network.ValueNetwork(
        train_env.observation_spec(),
        fc_layer_params=(75,75))

optimizer = tf.compat.v1.train.AdamOptimizer(learning_rate=learning_rate)

train_step_counter = tf.compat.v2.Variable(0)

tf_agent = reinforce_agent.ReinforceAgent(
    train_env.time_step_spec(),
    train_env.action_spec(),
    actor_network=actor_net,
    optimizer=optimizer,
    normalize_returns=True,
    train_step_counter=train_step_counter)
tf_agent.initialize()

replay_buffer = tf_uniform_replay_buffer.TFUniformReplayBuffer(
    data_spec=tf_agent.collect_data_spec,
    batch_size=train_env.batch_size,
    max_length=replay_buffer_capacity)

episode_counter = 0
train_env.reset()
while not episode_counter:
    time_step = train_env.current_time_step()
    action_step = tf_agent.collect_policy.action(time_step)
    next_time_step = train_env.step(action_step.action)
    traj = trajectory.from_transition(time_step, action_step, next_time_step)

    # Add trajectory to the replay buffer
    replay_buffer.add_batch(traj)

    if traj.is_boundary():
        episode_counter += 1

experience = replay_buffer.gather_all()
loss = tf_agent.train(experience)

print(loss.extra)
summer-yue commented 3 years ago

Thanks for reporting! This looks like a bug. Let me try rolling out a fix and update here.

summer-yue commented 3 years ago

Please let me know if the issue if fixed for you. Thanks!

Eisfoehniks commented 3 years ago

I just tested the fix and it works now. Thank you for your effort!