Asynchronous API for `ParallelRLEnv`

Hello, this work looks pretty cool and looking forward to using it in the future.

I was wondering if you would be interested in implementing EnvPool's Asynchronous API, which looks like below:

import envpool
import numpy as np

num_envs = 64
batch_size = 16
env = envpool.make("Pong-v5", env_type="gym", num_envs=num_envs, batch_size=batch_size)
action_num = env.action_space.n
env.async_reset()
obs, rew, done, info = env.recv()
print(obs.shape, info["env_id"])
action = np.random.randint(action_num, size=batch_size)
env.send(action, info["env_id"])
obs, rew, done, info = env.recv()
print(obs.shape, info["env_id"])
action = np.random.randint(action_num, size=batch_size)
env.send(action, info["env_id"])
obs, rew, done, info = env.recv()
print(obs.shape, info["env_id"])
action = np.random.randint(action_num, size=batch_size)
env.send(action, info["env_id"])

(16, 4, 84, 84) [ 1  0  8  3  5  9 11  6 13 12 16 14  4 18  2 19]
(16, 4, 84, 84) [23 24 17 21 25 26 28 20 32 31 22  7 15 29 27 30]
(16, 4, 84, 84) [34 10 38 41 40 35 33 36 39 37 42 48 51 50 52 44]

The general idea is to return a subset of the environments for the agent to sample actions while the environments execute other actions. This approach should scale considerably better, primarily when the engine backend uses socket (#219). In CleanRL we have a fast PPO implementation prototype that leverages this async API (see code here)

https://github.com/Farama-Foundation/Gymnasium/pull/98 also contains an example of implementing this type of Async API with existing vectorized environments.

huggingface / simulate

Asynchronous API for `ParallelRLEnv` #343