salesforce / warp-drive

Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning Framework on a GPU (JMLR 2022)
BSD 3-Clause "New" or "Revised" License
465 stars 78 forks source link

Correct way to wrap a gymnasium environment. #81

Closed Karlheinzniebuhr closed 1 year ago

Karlheinzniebuhr commented 1 year ago

I couldn't find a tutorial about how to wrap an gymnasium environment. I want to do something like this:

# Import the PPO algorithm from Stable Baselines 3
from stable_baselines3 import PPO

# Import the gymnasium module
import gymnasium

# Import the EnvWrapper class from WarpDrive
from warp_drive.utils.env_wrapper import EnvWrapper

# Define the number of parallel environments
n_envs = 256

# Choose an environment from Gymnasium
env_name = 'LunarLander-v2'

# Create a list of environment constructors with custom arguments
envs = [lambda: gymnasium.make(env_name, env_kwargs={
    "continuous": False,
    "gravity": -10.0,
    "enable_wind": False,
    "wind_power": 15.0,
    "turbulence_power": 1.5,
}) for _ in range(n_envs)]

# Create a wrapped environment object via the EnvWrapper
env_wrapper = EnvWrapper(envs, num_envs=n_envs, env_backend='pycuda')

model = PPO('MlpPolicy', env_wrapper)
model.learn(total_timesteps=1000000)
Karlheinzniebuhr commented 1 year ago

Updated my example to use gymnasium instead of gym, since gym is no longer maintained.

Emerald01 commented 1 year ago

There is no magic here. You cannot import a gym.make environment to bundle to the CUDA directly. Need to implement the gym environments in Numba or CUDA, then WarpDrive provides all gateways, managers and trainers to wrap them up together. You can refer to our design flowchart in the README.

noospheer commented 1 year ago

@Emerald01 is a Gymnasium wrapper planned for WarpDrive? Presumably would increase your userbase nontrivially

Emerald01 commented 1 year ago

WarpDrive is designed to work on multi-agent environments. And its main contribution is we built up an end-to-end (from backend CUDA sampler and environment step runner to the torch trainer) ecosystem. Users could write their own environment step() either using CUDA C/C++ or Numba following a few simple rules, and all the rest (sample, reset, data transfer, torch training, logging, multi-device communication and so on) are all managed by WarpDrive, and it will optimize the concurrent runs at thousands of GPU threads, which gives a huge speed gain especially for multi-agents.

In terms of specific environments, we have defaults environments and external users already use this code. We always welcome people to write their own step() function so more and more environments can be run here.

noospheer commented 1 year ago

@Emerald01 do you have any ideas for a quick and dirty step() function, such that gymnasium may leverage warpdrive?

noospheer commented 1 year ago

@Emerald01 do you have any ideas for a quick and dirty step() function, such that gymnasium may leverage warpdrive?

-bump-