Slightly tough to explain, so I can't easily provide minimum code. I'll try and explain my logic instead, and some debug print statements.
From my understanding, when using HER (I've been using DQN,, which I will refer to as model) during training:
1) A ReplayBufferclass type is created as model.replay_buffer
2) Transitions are stored in the replay buffer with model.replay_buffer_add(). This adds the transitions to model.replay_buffer.storage.
3) The function model.replay_buffer.can_sample(model.batch_size) is called. If there are more than model.batch_size transitions in the replay buffer storage, can_sample=True.
4) If can_sample=True, a batch of transitions are taken from model.replay_buffer.storage.
So, for example, if batch size is 32, then for the first 32 transitions if I change deepq/dqn.py line 262 and add print(can_sample), the first 31 will return False, then 32 onwards True. As expected.
When using HER however, model.replay_buffer() is not of the ReplayBufferclass type, but HindsightExperienceReplayWrapper. This then contains model.replay_buffer.replay_buffer, along with the buffer storage.
In theory, when model.replay_buffer_add()is called, the transition should therefore be added to model.replay_buffer.replay_buffer.storage(). HOWEVER, her/replay_buffer.py has a HindsightExperienceReplayWrapper.add() function that gets called at some point,. This stores the transition in a new variable called replay_buffer.episode_transitions and NOT model.replay_buffer.replay_buffer.storage(). when using HER
As a result, when calling model.replay_buffer.can_sample() (which calls HindsightExperienceReplayWrapper.can_sample() ) or even model.replay_buffer.replay_buffer.can_sample() will ALWAYS return False. Both from our print statement from earlier and by adding a print statement to HindsightExperienceReplayWrapper.can_sample(). A model using HER then never samples from its replay buffer.
Slightly tough to explain, so I can't easily provide minimum code. I'll try and explain my logic instead, and some debug print statements.
From my understanding, when using HER (I've been using DQN,, which I will refer to as model) during training:
1) A
ReplayBuffer
class type is created asmodel.replay_buffer
2) Transitions are stored in the replay buffer withmodel.replay_buffer_add().
This adds the transitions tomodel.replay_buffer.storage
. 3) The functionmodel.replay_buffer.can_sample(model.batch_size)
is called. If there are more thanmodel.batch_size
transitions in the replay buffer storage,can_sample=True
. 4) Ifcan_sample=True
, a batch of transitions are taken frommodel.replay_buffer.storage
.So, for example, if batch size is 32, then for the first 32 transitions if I change deepq/dqn.py line 262 and add
print(can_sample)
, the first 31 will returnFalse
, then 32 onwardsTrue
. As expected.When using HER however,
model.replay_buffer()
is not of theReplayBuffer
class type, butHindsightExperienceReplayWrapper
. This then containsmodel.replay_buffer.replay_buffer
, along with the buffer storage.In theory, when
model.replay_buffer_add()
is called, the transition should therefore be added to model.replay_buffer.replay_buffer.storage(). HOWEVER, her/replay_buffer.py has aHindsightExperienceReplayWrapper.add()
function that gets called at some point,. This stores the transition in a new variable calledreplay_buffer.episode_transitions
and NOTmodel.replay_buffer.replay_buffer.storage()
. when using HERAs a result, when calling
model.replay_buffer.can_sample()
(which callsHindsightExperienceReplayWrapper.can_sample()
) or evenmodel.replay_buffer.replay_buffer.can_sample()
will ALWAYS return False. Both from our print statement from earlier and by adding a print statement toHindsightExperienceReplayWrapper.can_sample()
. A model using HER then never samples from its replay buffer.I am using the latest version 2.10.1.