A feature request proposal to add support of Duel DQN, as suggested in paper [Dueling Network Architectures for Deep Reinforcement Learning] , which is described as:
"
The main benefit of this factoring is to generalize learning across actions without imposing any
change to the underlying reinforcement learning
algorithm. Our results show that this architecture leads to better policy evaluation in the presence of many similar-valued actions.
"
I have created a network class to support the Duel DQN architecture, which could be plugged into the existing DQNAgent.
The pull request could be referred here
A brief summary of the proposed change:
Add a new class DuelQNetwork: in new code file: tf_agents/networks/duel_q_network.py
self.encoder is the shared part to convert observation to common state tensor
self.a_encode_layer, self.a_value_layer is to project the common state tensor to action space in advantage branch
self.v_encode_layer, self.v_value_layer is to project the common state tensor to single scalar state value in state branch
apply the sum between the state value and the mean-adjusted (advantage - MEAN(advantage)) to get the Q-value
update train_eval.py to support train and eva via QNetwork and DuelQNetwork repsectively.
Following the experiment in the original DuelDQN paper, the network both uses 2 fully connected layers
The QNetwork uses fully connected layers (100, 100)
The DuelQNetwork uses fully connected layers (100) for the shared encoding part, and uses (50) for the advantage and state branch repsectively.
Their summaries could be referred to in the attached images.
Q-network
A feature request proposal to add support of Duel DQN, as suggested in paper [Dueling Network Architectures for Deep Reinforcement Learning] , which is described as: " The main benefit of this factoring is to generalize learning across actions without imposing any change to the underlying reinforcement learning algorithm. Our results show that this architecture leads to better policy evaluation in the presence of many similar-valued actions. "
I have created a network class to support the Duel DQN architecture, which could be plugged into the existing DQNAgent. The pull request could be referred here
A brief summary of the proposed change:
Add a new class DuelQNetwork: in new code file: tf_agents/networks/duel_q_network.py
self.encoder is the shared part to convert observation to common state tensor
self.a_encode_layer, self.a_value_layer is to project the common state tensor to action space in advantage branch self.v_encode_layer, self.v_value_layer is to project the common state tensor to single scalar state value in state branch
apply the sum between the state value and the mean-adjusted (advantage - MEAN(advantage)) to get the Q-value
update train_eval.py to support train and eva via QNetwork and DuelQNetwork repsectively.
Following the experiment in the original DuelDQN paper, the network both uses 2 fully connected layers
The QNetwork uses fully connected layers (100, 100) The DuelQNetwork uses fully connected layers (100) for the shared encoding part, and uses (50) for the advantage and state branch repsectively.
Their summaries could be referred to in the attached images. Q-network
Duel Q-Network
A corresponding pytorch implementation
Expect receiving your feedback and suggestion upon the code and the experiement result.Thanks