Implementation of Munchausen DQN in dqn_munchausen.py based on dqn.py. \
Paper link : https://arxiv.org/pdf/2007.14430 \
Three changes to vanilla dqn :
1) blue term in td target : entropy regularization
2) red term in td target : implicit kl (new / old policy) regularization
3) softmax policy
[ ] I have performed RLops with python -m openrlbenchmark.rlops.
For new feature or bug fix:
[ ] I have used the RLops utility to understand the performance impact of the changes and confirmed there is no regression.
For new algorithm:
[ ] I have created a table comparing my results against those from reputable sources (i.e., the original paper or other reference implementation).
[ ] I have added the learning curves generated by the python -m openrlbenchmark.rlops utility to the documentation.
[ ] I have added links to the tracked experiments in W&B, generated by python -m openrlbenchmark.rlops ....your_args... --report, to the documentation.
Description
Types of changes
Checklist:
pre-commit run --all-files
passes (required).mkdocs serve
.If you need to run benchmark experiments for a performance-impacting changes:
--capture_video
.python -m openrlbenchmark.rlops
.python -m openrlbenchmark.rlops
utility to the documentation.python -m openrlbenchmark.rlops ....your_args... --report
, to the documentation.