gouxiangchen / dueling-DQN-pytorch

very easy implementation of dueling DQN in pytorch
68 stars 9 forks source link

Rationale for Varying Epsilon #3

Closed 0xJchen closed 3 years ago

0xJchen commented 3 years ago

Thanks for the great work. One small question: Is there any reference for the varying epsilon along training?

if epsilon > FINAL_EPSILON:
    epsilon -= (INITIAL_EPSILON - FINAL_EPSILON) / EXPLORE
gouxiangchen commented 3 years ago

Hi,

I am sorry for the late reply.

It is mentioned in the DQN paper "Human-level control through deep reinforcement learning",

The behaviour policy during training was -greedy with annealed linearly from 1.0 to 0.1 over the first million frames, and fixed at 0.1 thereafter.

In the early training, the agent knows nothing with the environments and should explore more with higher . While the capability improves, decrease accordingly.

May this reply helps :).

0xJchen commented 3 years ago

Got it. Thanks!