vwxyzjn / cleanrl

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
http://docs.cleanrl.dev
Other
5.54k stars 631 forks source link

CleanRL can't be used by importing? #197

Closed cool-RR closed 2 years ago

cool-RR commented 2 years ago

Hi @vwxyzjn !

Jordan recommended that I use CleanRL when working with Petting Zoo, and it took me a bit to figure out something. Let me know whether I understand correctly. CleanRL isn't meant to be imported and used directly, like SB3. It's supposed to have reference implementations of algorithms which developers could copy off to their own program and tweak as necessary to serve their needs.

Is this correct? If so:

  1. It should be clearly stated in the readme, even in the title. Using the phrase "reference implementations" is helpful, and there needs to be at least a sentence that explains that CleanRL isn't meant to be imported, it's meant to be copied from.
  2. It would be cool if the implementations in CleanRL would have been useful in importable forms in some way. It could be a separate package. Many people want to just import PPO and have it work.
vwxyzjn commented 2 years ago

Hi, @cool-RR thanks for raising this issue. You are absolutely correct that CleanRL cannot be imported. Does https://github.com/vwxyzjn/cleanrl/blob/b0d00df8b926617651638923e622b51c0b477305/README.md better than clarify this?

It would be cool if the implementations in CleanRL would have been useful in importable forms in some way. It could be a separate package. Many people want to just import PPO and have it work.

CleanRL doesn't really support this, unfortunately. For the import PPO use cases, we'd recommend checking out Stable-baselines3 (SB3) which is a great modular library.

That said though, prototyping with SB3 can be challenging. As an example, it takes considerably more code to implement invalid action masking in SB3 (see PR) whereas in CleanRL it's just about 30 lines of code (see "invalid action masking" in this post). Unfortunately, there is no perfect solution for doing prototypes.

CleanRL can help you to understand the core of implementing invalid action masking to work with pettingzoo's chess, empowering you to build prototypes faster. Once you understand the core of the implementation, it would help you more seamlessly contribute the same feature to SB3 for wider adoption.

cool-RR commented 2 years ago

Thanks for the quick PR, I commented there.

CleanRL doesn't really support this, unfortunately. For the import PPO use cases, we'd recommend checking out Stable-baselines3 (SB3) which is a great modular library.

That said though, prototyping with SB3 can be challenging. As an example, it takes considerably more code to implement invalid action masking in SB3 (see PR) whereas in CleanRL it's just about 30 lines of code (see "invalid action masking" in this post). Unfortunately, there is no perfect solution for doing prototypes.

CleanRL can help you to understand the core of implementing invalid action masking to work with pettingzoo's chess, empowering you to build prototypes faster. Once you understand the core of the implementation, it would help you more seamlessly contribute the same feature to SB3 for wider adoption.

I understand, thanks.