[Feature Request] Simplify the example and tutorial codes

whatdhack commented 10 months ago

Motivation

It is hard to follow and understand the example and tutorials. As an example, if I compare the 2 flavors of cartpole PyTorch code, the one from PyTorch pytorch/tutorial is far easier to understand and follow than the one in pytorch/rl.

https://pytorch.org/tutorials/intermediate/reinforcement_q_learning.html https://github.com/pytorch/rl/blob/main/examples/dqn/dqn_cartpole.py

Solution

A clear and concise example code.

Alternatives

Additional context

Checklist

[ X] I have checked that there is no similar issue in the repo (required)

vmoens commented 10 months ago

Hi @whatdhack Thanks for this suggestion. It's something that has come over and over, we've had many discussions with @matteobettini @albertbou92 @BY571 @btx0424 @smorad and others about how to address this problem and the fact that it keeps coming up means we haven't done a great job dealing with it.

Let me bring a few datapoints to move the discussion forward:

Regarding the examples, our current rationale is that examples are examples of SOTA implementations rather than simple code examples. The name is confusing and should be something like benchmark, but the term is a bit overloaded (we have a benchmark folder dedicated to measuring runtimes). Baselines could do. We had some other ideas but none really stroke us as the perfect solution. In a way, we'd still like to be able to land an example that shows how to use a specific component without committing to having a piece of code that performs better than any other implementation, and the name examples gives us that freedom. Most of the time we do and will keep on doing our best to get the most fine tuned implementation of course.
Another thing to keep in mind regarding the examples is that we still want them to open the reader's mind to new possibilities. We tried to build them on top of various factories that you can easily hack through to make your own piece of code and solve your own problem, while still exposing most of the TorchRL's APIs to give you full visibility on what you can do with it. I understand the examples aren't as simple as it gets but they are trainable pieces of code that encompass most of our code base, which isn't trivial if you look at the current breadth of what we propose.
Regarding the tutorials, we tried to simplify them on several occasions yet there is always room for improvement. Our PPO tutorial is supposed to be a good example of (a) how to solve a problem in a few steps and (b) a showcase of how to make components play with each others. We could have more of these, or even simpler versions of DQN etc. I'd be thrilled to know what people want to see and what would be a useful tutorial for beginners if we got that wrong! Note that our tutorials are divided in three levels of difficulty to help users know where to start.
Now comparing the PyToch DQN tutorial with torchrl's one isn't really doing justice to both. The reason we kept the DQN tutorial from PT is that it's a useful example to train users to make the best of pytorch. I don't personally see it as a read "RL" example but more as a generalist ML example. In fact, it's a rather trivial problem and it's being solved in a very ad-hoc manner. With TorchRL, we want to give you the tools to quickly experiment DQN on that problem and then switch to SAC or anything else, maybe utilizing one version of the replay buffer or another, collecting data on multiple processes etc. The goal of our tutorials is to teach you how to get there. Again, not to say that your assessment is wrong -- we must have an easier way of getting acquainted with torchrl -- but I'd keep in mind the differences of scope between these examples.

I hope that helps. If you (or anyone else) have concrete suggestions of what would be a clear and concise tutorial to add to the lib we'd be excited to get started working on it!

vmoens commented 10 months ago

I'm closing this per lack of feedback but if there's any actionable we can do I'll be thrilled to consider it!

whatdhack commented 9 months ago

Specially in the dqn example, some of the well established logical division of Deep Learning are not followed. Hard to rationalize why the SyncDataCollector has a policy network attached to it. Also, looks like the dqn example calls the MLP 3 times in one iteration. !!

vmoens commented 9 months ago

It's very much WIP but here's the PR that will hopefully clarify things

https://github.com/pytorch/rl/pull/1886

There's a link on top to see the doc rendered according to this work

pytorch / rl