Unsurprisingly, DQN performs much better when trained with 1% of exploratory actions instead of 10% (as used in the original Nature paper).
Why is that unsurprising? Is there an explanation why DQN should be trained with 1% of exploratory actions instead of 10%? Or is this is just an empirical result which appears for most of DQN implementations?
Hi baselines/README.md states:
Why is that unsurprising? Is there an explanation why DQN should be trained with 1% of exploratory actions instead of 10%? Or is this is just an empirical result which appears for most of DQN implementations?