hill-a / stable-baselines

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms
http://stable-baselines.readthedocs.io/
MIT License
4.13k stars 723 forks source link

[question] HER and prioritized experience replay #751

Closed johannes-dornheim closed 4 years ago

johannes-dornheim commented 4 years ago

Hi

in the stable-baselines implementation, HER does not support prioritized replay buffer. In the HER Paper they state that: "Prioritized experience replay (....) is orthogonal to our work and both approaches can be easily combined". So my question is: Are there 'deeper reasons' for the lack of support or is it just a currently missing feature?

Best Regards, Johannes

Miffyli commented 4 years ago

I believe there are no bigger reasons to lack of support, other than lack of implementation. It would require coming up with prioritizes for the samples in the buffer, and then updating the replay_buffer.py in HER. I am not too familiar with HER to know how easy of a feat this would be. On the first glance it does not sound as straight-forward as with DQNs.

RyanRizzo96 commented 4 years ago

Actually, PER has been shown not to improve performance over HER, hence there is no real motivation to imlpement. Not only does PER not improve performance, but it actually increases computational time substantially.

PER works by prioritising transitions with higher TD-error, which means that the TD-error must be computed for each transition, hence the expensive computational time.

Prioritised Sequence Experience Replay (PSER) outperforms PER but has not been imlpemented with HER.

There are other methods which improve the sampling efficiency of HER (such as Energy Based Prioritisation), but PER is not one of them. I have put my name forward to implement this in the new PyTorch version.

Miffyli commented 4 years ago

Ok, thanks for the info! Indeed any such new features would be things for the PyTorch version :).

araffin commented 4 years ago

I added it to possible features for Stable-Baselines3 1.1+ in https://github.com/DLR-RM/stable-baselines3/issues/1