Value of Epsilon Decay Period

google / dopamine

Dopamine is a research framework for fast prototyping of reinforcement learning algorithms.

https://github.com/google/dopamine

Apache License 2.0

10.46k stars 1.37k forks source link

Value of Epsilon Decay Period #201

Open rfali opened 1 year ago

rfali commented 1 year ago

In the TF version of DQN, the value of epsilon_decay_period is set to 1M steps (see here), and for Rainbow, the value is set to 250k steps (see here).

However, the Rainbow paper says they anneal to 4M frames (i.e. 1M steps) for DQN (as done in Dopamine above), and importantly without Noisy Nets (which is the case with TF Rainbow), they anneal in the first 250K frames (and not steps, which would be 62500 steps with standard frame skipping of 4).

Is there a discrepancy here (Rainbow should anneal within 62k steps and not 250k steps), or am I misunderstanding something (or perhaps it really doesn't matter?). Thank you for your time.

Screenshot of page 4 of Rainbow paper

rfali commented 1 year ago

Also, for the JAX Full Rainbow agent (which has Noisy Nets), and when using Noisy Nets, epsilon greedy is disabled (as in paper snippet above, as well as some other implementations like Kaixhin Rainbow here and here). However, I still see the epsilon_train set to 0.01 in JAX Full Rainbow (here) and if Noisy is true, the identity_epsilonfunction is called which just returns the epsilon value (but doesn't uses 0).

psc-g commented 1 year ago

thank you for pointing this out! this has been fixed here: https://github.com/google/dopamine/commit/ed92c57bd547db68d63aabee383d4c55756a6a0f

rfali commented 1 year ago

Thanks! As for

Is there a Is there a discrepancy here (Rainbow should anneal within 62k steps and not 250k steps), or am I misunderstanding something (or perhaps it really doesn't matter?)

Should the epsilon_decay_period value for TF Rainbow (which does not use Noisy Nets) be 250k frames as in the Rainbow paper (which makes it 62500 steps with frame_skip=4) or 250k steps (as in current implementation) or perhaps it does not matter)? I have rarely seen a value as low as 62500 steps for epsilon decay, for example RLlib also uses 200k for its DQN variant and epislon greedy exploration is off when using Noisy Nets.