Closed whatdhack closed 10 months ago
Hi @whatdhack Thanks for this suggestion. It's something that has come over and over, we've had many discussions with @matteobettini @albertbou92 @BY571 @btx0424 @smorad and others about how to address this problem and the fact that it keeps coming up means we haven't done a great job dealing with it.
Let me bring a few datapoints to move the discussion forward:
examples
gives us that freedom. Most of the time we do and will keep on doing our best to get the most fine tuned implementation of course. I hope that helps. If you (or anyone else) have concrete suggestions of what would be a clear and concise tutorial to add to the lib we'd be excited to get started working on it!
I'm closing this per lack of feedback but if there's any actionable we can do I'll be thrilled to consider it!
Specially in the dqn example, some of the well established logical division of Deep Learning are not followed. Hard to rationalize why the SyncDataCollector has a policy network attached to it. Also, looks like the dqn example calls the MLP 3 times in one iteration. !!
It's very much WIP but here's the PR that will hopefully clarify things
https://github.com/pytorch/rl/pull/1886
There's a link on top to see the doc rendered according to this work
Motivation
It is hard to follow and understand the example and tutorials. As an example, if I compare the 2 flavors of cartpole PyTorch code, the one from PyTorch pytorch/tutorial is far easier to understand and follow than the one in pytorch/rl.
https://pytorch.org/tutorials/intermediate/reinforcement_q_learning.html https://github.com/pytorch/rl/blob/main/examples/dqn/dqn_cartpole.py
Solution
A clear and concise example code.
Alternatives
Additional context
Checklist