Adding Munchausen Reinforcement Learning

Problem Description

I propose to add a new algorithm : 'Munchausen Reinforcement Learning' Paper link

Checklist

[x] I have installed dependencies via poetry install (see CleanRL's installation guideline.
[x] I have checked that there is no similar issue in the repo.
[x] I have checked the documentation site and found not relevant information in GitHub issues.

Current Behavior

No implementation of Munchausen

Expected Behavior

Implementation of Munchausen

Possible Solution

Implementation of Munchausen on DQN (at least) and eventually implement IQN ? and IQN + Munchausen ?

Steps to implement

On DQN implem : two terms to add in the TD-target (bleu term and red term as described in Paper link) And softmax policy instead of epsilon-greedy I already have an implementation (which needs cleaning). I'm just checking to see if it meets the repo's needs before doing the changes.

vwxyzjn / cleanrl