The epsilon decay in the code is under the module agent.replay() which is called every step, making the epsilon rapidly decline during the first episode. I don't know if this was the intended behavior, but I've gotten better result by making a separate module for the epsilon decay and calling it by the end of an episode.
The epsilon decay in the code is under the module
agent.replay()
which is called every step, making the epsilon rapidly decline during the first episode. I don't know if this was the intended behavior, but I've gotten better result by making a separate module for the epsilon decay and calling it by the end of an episode.