pfnet / pfrl

PFRL: a PyTorch-based deep reinforcement learning library
MIT License
1.18k stars 158 forks source link

Hindsight Experience Replay Buffer #84

Open prabhatnagarajan opened 3 years ago

prabhatnagarajan commented 3 years ago

Depends on https://github.com/pfnet/pfrl/pull/80. Resolves https://github.com/pfnet/pfrl/issues/6.

Results:

her_bit_flip_dqn

prabhatnagarajan commented 3 years ago

/test

pfn-ci-bot commented 3 years ago

Successfully created a job for commit 71809a5:

prabhatnagarajan commented 3 years ago

/test

pfn-ci-bot commented 3 years ago

Successfully created a job for commit 9721d1a:

prabhatnagarajan commented 3 years ago

/test

pfn-ci-bot commented 3 years ago

Successfully created a job for commit 8643e0d:

prabhatnagarajan commented 3 years ago

/test

pfn-ci-bot commented 3 years ago

Successfully created a job for commit 035ad63:

muupan commented 3 years ago

Can you verify performance matches the published one or openai/baselines on Fetch tasks as well?

prabhatnagarajan commented 3 years ago

@ummavi also mentioned this in person. The main point of this PR is to add the HER feature for users, not to reproduce the Hindsight Replay paper, which has a whole different set of hyperparameters. I did the bit flip example because it's easy to implement, fast to experiment on, and a good verification that the system works. Moreover, this HER implementation is adapted from the one we used here: https://arxiv.org/abs/2007.08082, so correctness shouldn't be an issue (plus there are tests).

muupan commented 3 years ago

A verification that the system just "works" is not enough. A verification that the system works at least as well as the published/official result is needed for a new feature to keep pfrl reliable, particularly when the feature is complex. Also, there is no good reason not to compare against openai/baselines, which is kind of the official implementation of HER.

prabhatnagarajan commented 3 years ago

I'm not sure what you mean by `just "works"'. I've shown that it works on the bit-flip domain used in the paper (though they don't provide code or hyperparameters for that).

To come up with somewhat a ridiculous analogy, if a paper uses a function that computes the standard deviation, you don't need to reproduce the whole paper's results to be sure that standard deviation works correctly. In the case of Hindsight Experience Replay, there's a quite a lot of tricks and modifications that they use in addition to HER, which has a relatively simple implementation. In effect, those changes sit well outside of Hindsight experience replay. There wouldn't be changes to HER hyperparameters, for example, to achieve that level of reproducibility. In other words, the nature of HER is that being "correct" means that it also works (which I think is what you meant when you said "just works").

What would be helpful is:

But I'm unclear about generically stating that an entire system should work as well as a published paper in order to add a piece of that paper. Especially since for other features like ACER and Noisy networks (https://github.com/chainer/chainerrl/pull/169#issuecomment-344176330), we didn't reproduce the paper to determine sufficient performance. In these case, that it "works" was sufficient. I'd appreciate if we were consistent across the board about our requirements for feature additions, especially in scenarios where correctness and it working go hand in hand.

I see your point, and it's certainly true that more results are always more reassuring. But I don't think it's necessary here, especially given our precedents, and I think if you read the code you might change your mind. It seems to me that you're insinuating that for any and all features, regardless of what the code shows or how simple/complex it is, we have to reproduce the entire system to add the feature. Moreover, I should reiterate that this implementation was used in our own paper(unless you doubt the correctness of that?). And again, seems inconsistent with what we've done in the past, and as we've had no internal discussion regarding this, don't see why we should randomly introduce this requirement out of the blue.

muupan commented 3 years ago

Let's continue in the internal chat.

prabhatnagarajan commented 3 years ago

/test

pfn-ci-bot commented 3 years ago

Successfully created a job for commit 573c7a2:

prabhatnagarajan commented 3 years ago

/test

pfn-ci-bot commented 3 years ago

Successfully created a job for commit ed4ae2e:

prabhatnagarajan commented 3 years ago

Current results PickAndPlace Push Slide

prabhatnagarajan commented 3 years ago

/test

pfn-ci-bot commented 3 years ago

Successfully created a job for commit ef230a4:

ummavi commented 3 years ago

/test

pfn-ci-bot commented 3 years ago

Successfully created a job for commit 35f085c: