Beta (β) Annealing - Githubissues

In the original Prioritized Experience Replay paper [1], the beta parameter is linearly annealed from its initial value β0 to 1 (section 3.4 in the paper). The schedule is defined such that it would only reach 1 towards the end of learning.

This pull request adds beta annealing as an option (disabled by default) by changing prioritized_replay_beta_annealing to True. It will create a linear schedule for beta similar to the implementation in OpenAI Baselines, but modified to operate on tensors & variables (e.g. utilising num_training_steps variable for current timestep).

[1] Schaul et al. 2016, "Prioritized Experience Replay", arXiv:1511.05952

uber-research / ape-x

Beta (β) Annealing #4