[Proposal] RL Common / Baselines Common Package

araffin commented 5 years ago

With @hill-a , we came up to the same conclusion about things in the common folder: a lot is not specific to this library and could be separated in a stand-alone python package.

Doing so would:

allow people to develop/build things on that package without depending on a specific backend (i.e. this removes tensorflow dependency)
separate what is related to rl in general vs stable-baselines
ease the transition between tf 1 and tf 2 (#366) as we could decide to have two separate versions of SB instead of maintaining a version with tf1 and tf2 support at the same time (which seems complicated especially because of the removal of tf.contrib)

As a personal example, I have now an internal (minimal) pytorch version of stable-baselines (I plan to release it at some point but it is used for my own research purposes for now) where I had to copy a lot of what was present in the common folder.

I would like to have your thoughts on that ;) @erniejunior @AdamGleave @Miffyli

Suggestions for the name of the package are also welcomed, for now we have:

baselines-common (by @hill-a)
rl-common (my proposition)

the repo would be under the stable-baselines team organization: https://github.com/Stable-Baselines-Team

Miffyli commented 5 years ago

Which parts of common is this referring to, exactly? Files like distributions.py and policies.py, the other ones (not distributions/policies) or all files? Would the aim be to move most/all backend stuff to this new package, and then we can implement this package with different backends? The stable-baselines portion would then use these tools to implement the final RL algorithms?

I agree this would be a useful thing, especially now with the upcoming transitions to a new backend. It would be a lot of work and debugging to get everything right (and even then probably some clashing with interfaces), but it would help keeping current stable-baselines structure clean. At the same time we can tidy up other things, e.g.

Reorganize/rename some of the code under common which is somewhat sprinkled around
Get rid of the oddballs like layer definitions being in a2c.utils and common.distributions importing from there (but a2c then imports stuff from distributions via policies ???)

As for the name: I vote for "baselines-common". Term "Baselines" is already familiar in (D)RL scene. "rl-common" is nice too, but there is still the very active part of "non-deep reinforcement learning" which is quite different from the "deep reinforcement learning" where network-like models screw up everything :)

Edit: Cpt. Obvious here but the first thing to do is to gather solid baseline results of current stable-baselines/baselines we then should be able to replicate in the new backends or when major things change. I think results in the RLZoo would do the trick?

araffin commented 5 years ago

Which parts of common is this referring to, exactly?

Everything that is not related to tensorflow, I see at least (not exhaustive list):

vec envs
noise
maths utils
monitor
debug envs
replay buffer

Reorganize/rename some of the code under common which is somewhat sprinkled around Get rid of the oddballs like layer definitions being in a2c.utils

Totally agree

When using a new backend, you will need to implement those specific parts:

base class (at least for getting/setting the parameters)
policies
distributions
buffers (rollout buffer and replay buffer), the current replay buffer implementation rely on numpy only but could be improved if using a specific backend

I think results in the RLZoo would do the trick?

That is the whole point of rl zoo ;) Note: I have successfully replicated results with the pytorch backend, so I think I now got some experience of what can go wrong ;)

Miffyli commented 5 years ago

To check I understood this right: In summary, this would move backend independent tools used by one or more RL methods, used by environments or generally used for debugging to a new package.

This sounds good. The lack of assumptions on the backend makes throughout optimization impossible (e.g. using TF tensors to store replay data or so), but that is not what this package is for and I agree this would be very useful for quick debugging and setting up algorithms. 👍

I think repo for such package should be quick to setup by cloning stable-baselines setup with all the CI things and how tests/docs are done, etc?

PyTorch experiments: Yay! Now at least there is one of us who knows how to do them properly ^^.

araffin commented 5 years ago

I think repo for such package should be quick to setup by cloning stable-baselines setup with all the CI things and how tests/docs are done, etc?

It will require at least one day I think and then some iteration to refactor things nicely. But yes, in essence, this is mostly copy-pasting. The repo will be under stable-baselines-team organization.

Miffyli commented 5 years ago

Ah, I was talking about setting it up so we can start working on it ^^. Naturally the full refactoring / better documentation / etc will take longer, but at least would have a place to start with.

Would still like to hear comments from others, though (@erniejunior @AdamGleave)

AdamGleave commented 5 years ago

I'm weakly in favour of this. I think separating the common dependencies out would be slightly cleaner, although it might also slightly complicate setting up the development environment, which could deter some new contributors.

I suspect you'll be one of the few users of this change. So if it'll save you time to restructure this then go for it. But unless others chime up I would not do it for the good of the community. Not many people write RL frameworks from scratch, and those that do will often not want to use Stable Baselines "common" implementation. Even if we clean things up there will be lots of design decisions embedded in this. For example, the VecEnv interface works well for parallelizing at small scale on a single machine, but the synchronous API prevents it scaling to distributed RL workers.

araffin commented 5 years ago

I suspect you'll be one of the few users of this change.

That's true... Now, you make me ensure about what to do...

slightly complicate setting up the development environment

On that part, I would expect the "common" package to be relatively stable (in term of amount of change) compared to the main repo.

Not many people write RL frameworks from scratch

True, I was mostly thinking of people implementing only one algorithm. For instance, I've already seen the replay buffer implementation copied over.

Do you think there could be an alternative, like a git-submodule instead of a package? Better option: I think I've seen that somewhere, but I'm not sure it's possible, we create a second package using options in the setup.py while keeping everything else the same (so everything stay in the same repo, we would need maybe to reorganize things though)

Miffyli commented 5 years ago

The suggestions sound even more obscure than having a separate package. If goal is to also offer people a set of miscellaneous and independent utilities for RL algorithms, I do not see how git-submodules or the proposed setup.py method helps.

I have personally looked for independent implementations of some algorithms like more sophisticated return/advantage computations (e.g. GAE, V-trace), since it is easy have bugs in them. In that sense a repository of independent and common tools would be nice, but I do not think this would be easy if even possible to do in independent fashion, as the definition of the function can be closely tied to the learning algorithm itself.

AdamGleave commented 5 years ago

I agree with @Miffyli that if we're going to split it then it's better to be in a separate package. Probably best to be in its own repo too, although you can have multiple Python packages in a single repository.

hill-a / stable-baselines

[Proposal] RL Common / Baselines Common Package #503