In the original Prioritized Experience Replay paper [1], the beta parameter is linearly annealed from its initial value β0 to 1 (section 3.4 in the paper). The schedule is defined such that it would only reach 1 towards the end of learning.
This pull request adds beta annealing as an option (disabled by default) by changing prioritized_replay_beta_annealing to True. It will create a linear schedule for beta similar to the implementation in OpenAI Baselines, but modified to operate on tensors & variables (e.g. utilising num_training_steps variable for current timestep).
[1] Schaul et al. 2016, "Prioritized Experience Replay", arXiv:1511.05952
In the original Prioritized Experience Replay paper [1], the beta parameter is linearly annealed from its initial value β0 to 1 (section 3.4 in the paper). The schedule is defined such that it would only reach 1 towards the end of learning.
This pull request adds beta annealing as an option (disabled by default) by changing
prioritized_replay_beta_annealing
to True. It will create a linear schedule for beta similar to the implementation in OpenAI Baselines, but modified to operate on tensors & variables (e.g. utilisingnum_training_steps
variable for current timestep).[1] Schaul et al. 2016, "Prioritized Experience Replay", arXiv:1511.05952