Open prabhatnagarajan opened 4 years ago
Hi,
What's the current status of this?
I'm currently working on it (on-and-off) on the following branch of my personal fork: https://github.com/prabhatnagarajan/pfrl/tree/her. I'm planning on applying HER to the bit-flip environment from the original paper that introduced HER. I'm fairly confident the Hindsight Experience Replay implementation is good, as we've used a variant of it for other projects successfully. However, currently my performance on the bit-flip environment is poor and requires investigation.
Ah cool, thanks for the update.
HER requires that we make updates to the agent's policy+Q-function at the end of the episode. But, PFRL assumes that an agent.act(s)
is followed by an agent.observe(s', r)
(as evidenced by their use of batched_last_action
to keep track of actions). How are you going to deal with that?
Note that the HindsightReplayBuffer extends the EpisodicReplayBuffer. If you see the data structures within the EpisodicReplayBuffer, you can see that the episodic buffer maintains a current_episode
which is only appended to the larger replay buffer when an episode is stopped. This ensures that when we perform updates, we're not using incomplete episodes.
About the use of batch_last_action
, I'm not entirely sure what you're asking. If you see this function , we're using batch_last_action
, yes, but it's being added to the replay buffer, not being used for updates. At the end of the function we call self.replay_updater.update_if_necessary(self.t)
which will perform a gradient update, but it will not use batch_last_action
.
Does this answer your question? If not, feel free to clarify and I'll do my best to answer.
Hindsight Experience Replay with bit-flipping example: https://arxiv.org/abs/1707.01495