sail-sg / envpool

C++-based high-performance parallel environment execution engine (vectorized env) for general RL environments.
https://envpool.readthedocs.io
Apache License 2.0
1.06k stars 99 forks source link

[Feature Request] More APIs for Environment Parameters Updating #62

Open Mehooz opened 2 years ago

Mehooz commented 2 years ago

Motivation

Seems that this project can contribute to a wide range of robotic learning research directions :+1:

However, there is a core limitation of current version that there is no API for curriculum learning, domain randomization, or other environment-updating functions.

These APIs are very common in recent works on RL for legged robots, quadrotors, dexterous hands, etc.

For example, we might want the training environment starting from a easy stage, and then harder and harder.

Normally, we can parameterize the env with some modifiable parameters. Updating parameters can automatically contribute to make the env change.

Solution

A simple solution is to add some API to the main env classes, like update_parameters, init_parameters, compatible to C++ functions. A good reference can be this module, which always updates the env parameters and do randomization for robot learning.

Alternative Solution

Add child classes of Env (like "ParameterizedEnv"), including the needed APIs.

Checklist

mavenlin commented 2 years ago

If I understand correctly, does this mean that the "parameters" can be updated dynamically after the envs are initialized? Is it required to vary or randomize the "parameters" for the different envs in the pool?

One mechanism we have used internally at Sea AI lab is to pass a python function to the config, so that each time the envs reset, they dynamically generates a new config value as per specified in the python function. Would this mechanism suffice the use case here?

Mehooz commented 2 years ago

Thanks for the reply! Yes, we should not only be able to update the parameters for each episode, but even in each env step! Basically, I believe the mechanism you mention can handle this case (if it is efficient enough). But in terms of system design, I guess we can refer to the dm_control base env class.

You can see the functions like before_step and after_step, which will handle some parameter updating in a more pretty way. Besides, if we consider using a memory pool as the observation and action buffer, or define some complex reward computations, they should be updated in after_step. Robotics guys split the step only because there might be so many logic modules in each step. And these kinds of hooks are used in other popular robotics simulators like Raisim and IsaacGym. So together with #59, we may consider adding some new general modules for RL+Robotics usage.

mavenlin commented 2 years ago

Thanks for clarifying. It seems more profound that I initially thought. From your description, there're some complications mainly due to performance.

Passing a custom python function down to c++ is okay if we're doing it once in a while, e.g. using it to config an episode. But not ok for performance reason if it is done every step. C++ calling python is expensive and not parallelizable due to GIL. This will diminish the acceleration brought by envpool, and make it no faster than the other python based envs.

There're some work-arounds in my mind,

  1. Instead of letting the user fully customize it, we may implement a few c++ primitives. The user can pass parameters to select which primitive is called.
  2. Think out loud, we may limit the primitives to be implemented in numba, and called the jitted code from the c++ threads. (feasibility & efficiency unknown).