`PPOAgent` ignores network.losses

tensorflow / agents

TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.

Apache License 2.0

2.78k stars 719 forks source link

`PPOAgent` ignores network.losses #422

Open zroug opened 4 years ago

zroug commented 4 years ago

PPOAgent does not use regularization losses that are defined in the value_net and actor_net. It should add them to the other losses.

kbanoop commented 4 years ago

@summer-yue Can you PTAL at this?

summer-yue commented 4 years ago

Could you please elaborate? I see that the l2_regularization_loss includes the regularization losses and was added to total_loss. The coefficients are default to 0, but if you wish to use regularization losses, you should be able to change the value_function_l2_reg and policy_l2_reg.

zroug commented 4 years ago

I mean regularization that was added to the network with the add_loss function. For example activity regularization and other more advanced regularization can be done with this function.

The losses can be obtained with the member losses of the Network class. DqnAgent for example uses them instead of implementing its own regularization:

https://github.com/tensorflow/agents/blob/dc7417b577c72af3026a761e01363e1587b256a1/tf_agents/agents/dqn/dqn_agent.py#L507

summer-yue commented 4 years ago

Oh I see. Thanks for explaining. So you are pointing at the inconsistency in the implementation between PPO and other agents in terms of where losses are calculated, not that PPO doesn't do regularization.

We are checking with the team. There might be some historical reasons for this.

zroug commented 4 years ago

Yes, exactly.

I formulated it that way because I had a network that had its regularization defined during the creation of the network via said function and they weren't applied, which is a bit surprising. Regularization that is defined in the PPOAgent constructor is applied just fine.

A drawback of the current implementation is that it can only do l1 and l2 regularization and that it can't be restricted to a specific set of weights for example.

Thanks for checking with the team!

summer-yue commented 4 years ago

Sorry for the delayed response. We are not sure why this inconsistency exists, probably be some historical reasons. We could look into this once we get some bandwidth. Also feel free to submit a PR and contribute yourself :)

ebrevdo commented 3 years ago

This is simply oversight. I think we need to add an agent validator and ensure agents properly use layer losses - or raise an exception if they don't support separate losses.

ebrevdo commented 3 years ago

@oars fyi, can you include this in the validator bug?