huggingface / simulate

🎢 Creating and sharing simulation environments for embodied and synthetic data research
https://huggingface.co/docs/simulate
Apache License 2.0
187 stars 13 forks source link

Asynchronous API for `ParallelRLEnv` #343

Open vwxyzjn opened 1 year ago

vwxyzjn commented 1 year ago

Hello, this work looks pretty cool and looking forward to using it in the future.

I was wondering if you would be interested in implementing EnvPool's Asynchronous API, which looks like below:

import envpool
import numpy as np

num_envs = 64
batch_size = 16
env = envpool.make("Pong-v5", env_type="gym", num_envs=num_envs, batch_size=batch_size)
action_num = env.action_space.n
env.async_reset()
obs, rew, done, info = env.recv()
print(obs.shape, info["env_id"])
action = np.random.randint(action_num, size=batch_size)
env.send(action, info["env_id"])
obs, rew, done, info = env.recv()
print(obs.shape, info["env_id"])
action = np.random.randint(action_num, size=batch_size)
env.send(action, info["env_id"])
obs, rew, done, info = env.recv()
print(obs.shape, info["env_id"])
action = np.random.randint(action_num, size=batch_size)
env.send(action, info["env_id"])
(16, 4, 84, 84) [ 1  0  8  3  5  9 11  6 13 12 16 14  4 18  2 19]
(16, 4, 84, 84) [23 24 17 21 25 26 28 20 32 31 22  7 15 29 27 30]
(16, 4, 84, 84) [34 10 38 41 40 35 33 36 39 37 42 48 51 50 52 44]

The general idea is to return a subset of the environments for the agent to sample actions while the environments execute other actions. This approach should scale considerably better, primarily when the engine backend uses socket (#219). In CleanRL we have a fast PPO implementation prototype that leverages this async API (see code here)

image

https://github.com/Farama-Foundation/Gymnasium/pull/98 also contains an example of implementing this type of Async API with existing vectorized environments.

natolambert commented 1 year ago

Ah very cool @vwxyzjn! I'm learning a bit on how this is different from MultiProcessRLEnv (source). In general I agree yes that this would be good to support, all the best RL infrastructure's I know have been going async like this.

I think @edbeeching will comment when he's back in a couple days.