tensorflow / agents

TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.
Apache License 2.0
2.77k stars 714 forks source link

Feature Request: Support Duel DQN #544

Open king821221 opened 3 years ago

king821221 commented 3 years ago

A feature request proposal to add support of Duel DQN, as suggested in paper [Dueling Network Architectures for Deep Reinforcement Learning] , which is described as: " The main benefit of this factoring is to generalize learning across actions without imposing any change to the underlying reinforcement learning algorithm. Our results show that this architecture leads to better policy evaluation in the presence of many similar-valued actions. "

I have created a network class to support the Duel DQN architecture, which could be plugged into the existing DQNAgent. The pull request could be referred here

A brief summary of the proposed change:

Add a new class DuelQNetwork: in new code file: tf_agents/networks/duel_q_network.py

self.encoder is the shared part to convert observation to common state tensor

self.a_encode_layer, self.a_value_layer is to project the common state tensor to action space in advantage branch self.v_encode_layer, self.v_value_layer is to project the common state tensor to single scalar state value in state branch

apply the sum between the state value and the mean-adjusted (advantage - MEAN(advantage)) to get the Q-value

update train_eval.py to support train and eva via QNetwork and DuelQNetwork repsectively.

Following the experiment in the original DuelDQN paper, the network both uses 2 fully connected layers

The QNetwork uses fully connected layers (100, 100) The DuelQNetwork uses fully connected layers (100) for the shared encoding part, and uses (50) for the advantage and state branch repsectively.

Their summaries could be referred to in the attached images. Q-network dqn_cart_pole_v0

Duel Q-Network duel_dqn_cart_pole_v0

A corresponding pytorch implementation

Expect receiving your feedback and suggestion upon the code and the experiement result.Thanks

ebrevdo commented 3 years ago

Between this and other DQN advances, I think it's better to focus time on implementing and evaluating IQN and Distributional variants of CQL-DQN.