Hindsight Experience Replay Buffer

prabhatnagarajan commented 3 years ago

Depends on https://github.com/pfnet/pfrl/pull/80. Resolves https://github.com/pfnet/pfrl/issues/6.

Results:

her_bit_flip_dqn

prabhatnagarajan commented 3 years ago

/test

pfn-ci-bot commented 3 years ago

Successfully created a job for commit 71809a5:

Dashboard for commit 71809a5

prabhatnagarajan commented 3 years ago

/test

pfn-ci-bot commented 3 years ago

Successfully created a job for commit 9721d1a:

Dashboard for commit 9721d1a

prabhatnagarajan commented 3 years ago

/test

pfn-ci-bot commented 3 years ago

Successfully created a job for commit 8643e0d:

Dashboard for commit 8643e0d

prabhatnagarajan commented 3 years ago

/test

pfn-ci-bot commented 3 years ago

Successfully created a job for commit 035ad63:

Dashboard for commit 035ad63

muupan commented 3 years ago

Can you verify performance matches the published one or openai/baselines on Fetch tasks as well?

prabhatnagarajan commented 3 years ago

@ummavi also mentioned this in person. The main point of this PR is to add the HER feature for users, not to reproduce the Hindsight Replay paper, which has a whole different set of hyperparameters. I did the bit flip example because it's easy to implement, fast to experiment on, and a good verification that the system works. Moreover, this HER implementation is adapted from the one we used here: https://arxiv.org/abs/2007.08082, so correctness shouldn't be an issue (plus there are tests).

muupan commented 3 years ago

A verification that the system just "works" is not enough. A verification that the system works at least as well as the published/official result is needed for a new feature to keep pfrl reliable, particularly when the feature is complex. Also, there is no good reason not to compare against openai/baselines, which is kind of the official implementation of HER.

prabhatnagarajan commented 3 years ago

I'm not sure what you mean by `just "works"'. I've shown that it works on the bit-flip domain used in the paper (though they don't provide code or hyperparameters for that).

To come up with somewhat a ridiculous analogy, if a paper uses a function that computes the standard deviation, you don't need to reproduce the whole paper's results to be sure that standard deviation works correctly. In the case of Hindsight Experience Replay, there's a quite a lot of tricks and modifications that they use in addition to HER, which has a relatively simple implementation. In effect, those changes sit well outside of Hindsight experience replay. There wouldn't be changes to HER hyperparameters, for example, to achieve that level of reproducibility. In other words, the nature of HER is that being "correct" means that it also works (which I think is what you meant when you said "just works").

What would be helpful is:

If you could point out concretely what you think could be wrong with the HER implementation
What you find unconvincing about the results

But I'm unclear about generically stating that an entire system should work as well as a published paper in order to add a piece of that paper. Especially since for other features like ACER and Noisy networks (https://github.com/chainer/chainerrl/pull/169#issuecomment-344176330), we didn't reproduce the paper to determine sufficient performance. In these case, that it "works" was sufficient. I'd appreciate if we were consistent across the board about our requirements for feature additions, especially in scenarios where correctness and it working go hand in hand.

I see your point, and it's certainly true that more results are always more reassuring. But I don't think it's necessary here, especially given our precedents, and I think if you read the code you might change your mind. It seems to me that you're insinuating that for any and all features, regardless of what the code shows or how simple/complex it is, we have to reproduce the entire system to add the feature. Moreover, I should reiterate that this implementation was used in our own paper(unless you doubt the correctness of that?). And again, seems inconsistent with what we've done in the past, and as we've had no internal discussion regarding this, don't see why we should randomly introduce this requirement out of the blue.

muupan commented 3 years ago

Let's continue in the internal chat.

prabhatnagarajan commented 3 years ago

/test

pfn-ci-bot commented 3 years ago

Successfully created a job for commit 573c7a2: