Closed araffin closed 4 years ago
Which parts of common
is this referring to, exactly? Files like distributions.py
and policies.py
, the other ones (not distributions/policies) or all files? Would the aim be to move most/all backend stuff to this new package, and then we can implement this package with different backends? The stable-baselines portion would then use these tools to implement the final RL algorithms?
I agree this would be a useful thing, especially now with the upcoming transitions to a new backend. It would be a lot of work and debugging to get everything right (and even then probably some clashing with interfaces), but it would help keeping current stable-baselines structure clean. At the same time we can tidy up other things, e.g.
common
which is somewhat sprinkled arounda2c.utils
and common.distributions
importing from there (but a2c
then imports stuff from distributions
via policies
???)As for the name: I vote for "baselines-common". Term "Baselines" is already familiar in (D)RL scene. "rl-common" is nice too, but there is still the very active part of "non-deep reinforcement learning" which is quite different from the "deep reinforcement learning" where network-like models screw up everything :)
Edit: Cpt. Obvious here but the first thing to do is to gather solid baseline results of current stable-baselines/baselines we then should be able to replicate in the new backends or when major things change. I think results in the RLZoo would do the trick?
Which parts of common is this referring to, exactly?
Everything that is not related to tensorflow, I see at least (not exhaustive list):
Reorganize/rename some of the code under common which is somewhat sprinkled around Get rid of the oddballs like layer definitions being in a2c.utils
Totally agree
When using a new backend, you will need to implement those specific parts:
I think results in the RLZoo would do the trick?
That is the whole point of rl zoo ;) Note: I have successfully replicated results with the pytorch backend, so I think I now got some experience of what can go wrong ;)
To check I understood this right: In summary, this would move backend independent tools used by one or more RL methods, used by environments or generally used for debugging to a new package.
This sounds good. The lack of assumptions on the backend makes throughout optimization impossible (e.g. using TF tensors to store replay data or so), but that is not what this package is for and I agree this would be very useful for quick debugging and setting up algorithms. 👍
I think repo for such package should be quick to setup by cloning stable-baselines setup with all the CI things and how tests/docs are done, etc?
PyTorch experiments: Yay! Now at least there is one of us who knows how to do them properly ^^.
I think repo for such package should be quick to setup by cloning stable-baselines setup with all the CI things and how tests/docs are done, etc?
It will require at least one day I think and then some iteration to refactor things nicely.
But yes, in essence, this is mostly copy-pasting.
The repo will be under stable-baselines-team
organization.
Ah, I was talking about setting it up so we can start working on it ^^. Naturally the full refactoring / better documentation / etc will take longer, but at least would have a place to start with.
Would still like to hear comments from others, though (@erniejunior @AdamGleave)
I'm weakly in favour of this. I think separating the common dependencies out would be slightly cleaner, although it might also slightly complicate setting up the development environment, which could deter some new contributors.
I suspect you'll be one of the few users of this change. So if it'll save you time to restructure this then go for it. But unless others chime up I would not do it for the good of the community. Not many people write RL frameworks from scratch, and those that do will often not want to use Stable Baselines "common" implementation. Even if we clean things up there will be lots of design decisions embedded in this. For example, the VecEnv interface works well for parallelizing at small scale on a single machine, but the synchronous API prevents it scaling to distributed RL workers.
I suspect you'll be one of the few users of this change.
That's true... Now, you make me ensure about what to do...
slightly complicate setting up the development environment
On that part, I would expect the "common" package to be relatively stable (in term of amount of change) compared to the main repo.
Not many people write RL frameworks from scratch
True, I was mostly thinking of people implementing only one algorithm. For instance, I've already seen the replay buffer implementation copied over.
Do you think there could be an alternative, like a git-submodule instead of a package?
Better option: I think I've seen that somewhere, but I'm not sure it's possible, we create a second package using options in the setup.py
while keeping everything else the same (so everything stay in the same repo, we would need maybe to reorganize things though)
The suggestions sound even more obscure than having a separate package. If goal is to also offer people a set of miscellaneous and independent utilities for RL algorithms, I do not see how git-submodules or the proposed setup.py
method helps.
I have personally looked for independent implementations of some algorithms like more sophisticated return/advantage computations (e.g. GAE, V-trace), since it is easy have bugs in them. In that sense a repository of independent and common tools would be nice, but I do not think this would be easy if even possible to do in independent fashion, as the definition of the function can be closely tied to the learning algorithm itself.
I agree with @Miffyli that if we're going to split it then it's better to be in a separate package. Probably best to be in its own repo too, although you can have multiple Python packages in a single repository.
With @hill-a , we came up to the same conclusion about things in the common folder: a lot is not specific to this library and could be separated in a stand-alone python package.
Doing so would:
As a personal example, I have now an internal (minimal) pytorch version of stable-baselines (I plan to release it at some point but it is used for my own research purposes for now) where I had to copy a lot of what was present in the common folder.
I would like to have your thoughts on that ;) @erniejunior @AdamGleave @Miffyli
Suggestions for the name of the package are also welcomed, for now we have:
the repo would be under the stable-baselines team organization: https://github.com/Stable-Baselines-Team