Closed kihyukh closed 2 years ago
Currently, only a single step (s, a, r, s') is considered for training DDQN. There is empirical study that multi-step training performs better: https://rayyoh.github.io/files/2017-Rainbow.pdf
Let's implement multi-step DDQN version.
Part of the code affected:
So far, multi_steps = 3 produced the best results.
Currently, only a single step (s, a, r, s') is considered for training DDQN. There is empirical study that multi-step training performs better: https://rayyoh.github.io/files/2017-Rainbow.pdf
Let's implement multi-step DDQN version.
Part of the code affected: