uber-research / ape-x

This repo replicates the results Horgan et al obtained in "Distributed Prioritized Experience Replay"
Apache License 2.0
189 stars 23 forks source link

Beta (β) Annealing #4

Closed abdel closed 5 years ago

abdel commented 5 years ago

In the original Prioritized Experience Replay paper [1], the beta parameter is linearly annealed from its initial value β0 to 1 (section 3.4 in the paper). The schedule is defined such that it would only reach 1 towards the end of learning.

This pull request adds beta annealing as an option (disabled by default) by changing prioritized_replay_beta_annealing to True. It will create a linear schedule for beta similar to the implementation in OpenAI Baselines, but modified to operate on tensors & variables (e.g. utilising num_training_steps variable for current timestep).

[1] Schaul et al. 2016, "Prioritized Experience Replay", arXiv:1511.05952

CLAassistant commented 5 years ago

CLA assistant check
All committers have signed the CLA.