hill-a / stable-baselines

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms
http://stable-baselines.readthedocs.io/
MIT License
4.14k stars 723 forks source link

HER not sampling from replay buffer? #1110

Closed OGordon100 closed 3 years ago

OGordon100 commented 3 years ago

Slightly tough to explain, so I can't easily provide minimum code. I'll try and explain my logic instead, and some debug print statements.

From my understanding, when using HER (I've been using DQN,, which I will refer to as model) during training:

1) A ReplayBufferclass type is created as model.replay_buffer 2) Transitions are stored in the replay buffer with model.replay_buffer_add(). This adds the transitions to model.replay_buffer.storage. 3) The function model.replay_buffer.can_sample(model.batch_size) is called. If there are more than model.batch_size transitions in the replay buffer storage, can_sample=True. 4) If can_sample=True, a batch of transitions are taken from model.replay_buffer.storage.

So, for example, if batch size is 32, then for the first 32 transitions if I change deepq/dqn.py line 262 and add print(can_sample), the first 31 will return False, then 32 onwards True. As expected.

When using HER however, model.replay_buffer() is not of the ReplayBufferclass type, but HindsightExperienceReplayWrapper. This then contains model.replay_buffer.replay_buffer, along with the buffer storage.

In theory, when model.replay_buffer_add()is called, the transition should therefore be added to model.replay_buffer.replay_buffer.storage(). HOWEVER, her/replay_buffer.py has a HindsightExperienceReplayWrapper.add() function that gets called at some point,. This stores the transition in a new variable called replay_buffer.episode_transitions and NOT model.replay_buffer.replay_buffer.storage(). when using HER

As a result, when calling model.replay_buffer.can_sample() (which calls HindsightExperienceReplayWrapper.can_sample() ) or even model.replay_buffer.replay_buffer.can_sample() will ALWAYS return False. Both from our print statement from earlier and by adding a print statement to HindsightExperienceReplayWrapper.can_sample(). A model using HER then never samples from its replay buffer.

I am using the latest version 2.10.1.

OGordon100 commented 3 years ago

Wait, sorry, I just didn't understand it stores at the end of the episode. Amazing.