Rationale for Varying Epsilon

0xJchen commented 3 years ago

Thanks for the great work. One small question: Is there any reference for the varying epsilon along training?

if epsilon > FINAL_EPSILON:
    epsilon -= (INITIAL_EPSILON - FINAL_EPSILON) / EXPLORE

gouxiangchen commented 3 years ago

Hi,

I am sorry for the late reply.

It is mentioned in the DQN paper "Human-level control through deep reinforcement learning",

The behaviour policy during training was $\varepsilon$ -greedy with $\varepsilon$ annealed linearly from 1.0 to 0.1 over the first million frames, and fixed at 0.1 thereafter.

In the early training, the agent knows nothing with the environments and should explore more with higher $\varepsilon$ . While the capability improves, $\varepsilon$ decrease accordingly.

May this reply helps :).

0xJchen commented 3 years ago

Got it. Thanks!

gouxiangchen / dueling-DQN-pytorch

Rationale for Varying Epsilon #3