Implementation of Munchausen on DQN (at least) and eventually implement IQN ? and IQN + Munchausen ?
Steps to implement
On DQN implem : two terms to add in the TD-target (bleu term and red term as described in Paper link)
And softmax policy instead of epsilon-greedy
I already have an implementation (which needs cleaning). I'm just checking to see if it meets the repo's needs before doing the changes.
Problem Description
I propose to add a new algorithm : 'Munchausen Reinforcement Learning' Paper link
Checklist
poetry install
(see CleanRL's installation guideline.Current Behavior
No implementation of Munchausen
Expected Behavior
Implementation of Munchausen
Possible Solution
Implementation of Munchausen on DQN (at least) and eventually implement IQN ? and IQN + Munchausen ?
Steps to implement
On DQN implem : two terms to add in the TD-target (bleu term and red term as described in Paper link) And softmax policy instead of epsilon-greedy I already have an implementation (which needs cleaning). I'm just checking to see if it meets the repo's needs before doing the changes.