Adding Munchausen Reinforcement Learning

Description

Implementation of Munchausen DQN in dqn_munchausen.py based on dqn.py. \ Paper link : https://arxiv.org/pdf/2007.14430 \ Three changes to vanilla dqn : 1) blue term in td target : entropy regularization 2) red term in td target : implicit kl (new / old policy) regularization 3) softmax policy

[x] I've read the CONTRIBUTION guide (required).
[x] I have ensured pre-commit run --all-files passes (required).
[x] I have updated the tests accordingly (if applicable).
[ ] I have updated the documentation and previewed the changes via mkdocs serve.
- [x] I have explained note-worthy implementation details.
- [ ] I have explained the logged metrics.
- [x] I have added links to the original paper and related papers.

If you need to run benchmark experiments for a performance-impacting changes:

[ ] I have contacted @vwxyzjn to obtain access to the openrlbenchmark W&B team.
[ ] I have used the benchmark utility to submit the tracked experiments to the openrlbenchmark/cleanrl W&B project, optionally with --capture_video.
[ ] I have performed RLops with python -m openrlbenchmark.rlops.
- For new feature or bug fix:
  - [ ] I have used the RLops utility to understand the performance impact of the changes and confirmed there is no regression.
- For new algorithm:
  - [ ] I have created a table comparing my results against those from reputable sources (i.e., the original paper or other reference implementation).
- [ ] I have added the learning curves generated by the python -m openrlbenchmark.rlops utility to the documentation.
- [ ] I have added links to the tracked experiments in W&B, generated by python -m openrlbenchmark.rlops ....your_args... --report, to the documentation.