vwxyzjn / cleanrl

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
http://docs.cleanrl.dev
Other
4.84k stars 560 forks source link

Adding Munchausen Reinforcement Learning #464

Open Paul-antoineLeTolguenec opened 1 week ago

Paul-antoineLeTolguenec commented 1 week ago

Problem Description

I propose to add a new algorithm : 'Munchausen Reinforcement Learning' Paper link

Checklist

Current Behavior

No implementation of Munchausen

Expected Behavior

Implementation of Munchausen

Possible Solution

Implementation of Munchausen on DQN (at least) and eventually implement IQN ? and IQN + Munchausen ?

Steps to implement

On DQN implem : two terms to add in the TD-target (bleu term and red term as described in Paper link) And softmax policy instead of epsilon-greedy I already have an implementation (which needs cleaning). I'm just checking to see if it meets the repo's needs before doing the changes.