salesforce / warp-drive

Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning Framework on a GPU (JMLR 2022)
BSD 3-Clause "New" or "Revised" License
465 stars 78 forks source link

Addition of Other Reinforcement Learning Algorithms (i.e., Q-Learning) #44

Closed rllyryan closed 2 years ago

rllyryan commented 2 years ago

Dear WarpDrive Team,

May I find out if it is possible to implement other reinforcement learning algorithms into WarpDrive (i.e., Q-Learning)?

If not, may I ask whether PPO and A2C are considered one of the better algorithms out there in the field? I am not that well informed of the algorithms and their individual advantages, but from what I have garnered from online searches:

It can be observed that PPO provides a better convergence and performance rate than other techniques but is sensitive to changes. DQN alone is unstable and gives poor convergence, hence requires several add-ons.

Reference: https://medium.datadriveninvestor.com/which-reinforcement-learning-rl-algorithm-to-use-where-when-and-in-what-scenario-e3e7617fb0b1

Emerald01 commented 2 years ago

Hi, Thank you for the question. As the output of the environment simulator, WarpDrive provides data type as Pytorch tensor, therefore for the training part, WarpDrive has no difference from any other RL infrastructure using Python/Pytorch. You can fulfill any training algorithm written by Pytorch. For Q learning, you can refer to our A2C example, and changes to the corresponding Q learning algorithm; or we have the Lightning example showing how we directly integrate with Pytorch Lightning for the training, in that way, it should be even easier for you to grab a Lightning trainer.

rllyryan commented 2 years ago

Hi, Thank you for the question. As the output of the environment simulator, WarpDrive provides data type as Pytorch tensor, therefore for the training part, WarpDrive has no difference from any other RL infrastructure using Python/Pytorch. You can fulfill any training algorithm written by Pytorch. For Q learning, you can refer to our A2C example, and changes to the corresponding Q learning algorithm; or we have the Lightning example showing how we directly integrate with Pytorch Lightning for the training, in that way, it should be even easier for you to grab a Lightning trainer.

Hi @Emerald01,

Thank you for your explanation and suggestion. I will take a look at Pytorch Lightning trainer, it seems pretty good at skipping boilerplate codelines. As for the A2C example, could I ask which tutorial exactly are you referring to?

Emerald01 commented 2 years ago

I mean the trainer itself is a typical Pytorch trainer since the output data is torch tensor https://github.com/salesforce/warp-drive/blob/master/warp_drive/training/algorithms/a2c.py

So if you like to use Q learning or any other algorithm, you can borrow directly and use for WarpDrive .

rllyryan commented 2 years ago

I mean the trainer itself is a typical Pytorch trainer since the output data is torch tensor https://github.com/salesforce/warp-drive/blob/master/warp_drive/training/algorithms/a2c.py

So if you like to use Q learning or any other algorithm, you can borrow directly and use for WarpDrive .

Understand! Thank you!