jr-robotics / robo-gym

An open source toolkit for Distributed Deep Reinforcement Learning on real and simulated robots.
https://sites.google.com/view/robo-gym
MIT License
414 stars 74 forks source link

How can i start distributed parallel enviroment in the process of training? #72

Open Daviddeer2 opened 1 year ago

Daviddeer2 commented 1 year ago

Hi there, the readme says that distributed parallel sampling can be implemented. But it doesn't look like this feature is presented in examples, for example the td3_script.py. In issure #24 , you said "You can start the Server Manager once and then call env.make() multiple times , with the algorithm that we are using right now we have multiple workers running in parallel and each worker is calling env.make() and the Server Manager spawns a new instance of the env. ". Does you mean that there are only certain specific algorithms which contains multiple workers like D4PG could pararrel sample? It's confused that in openai gym, parallel envs could be accomplished by VecEnv based on multithreading of python, so that any RL algorithm could make it. What can i have to do to start a parallel envs with the help of robo-gym? Is there any examples or documents to reference? I would really appreciate it if someone can help me out. Thanks in advance.

jr-b-reiterer commented 1 year ago

Hey @Daviddeer2, sorry for the long delay.

D4PG is one option that we have been using internally.

With stable-baselines3 it is also possible to simply wrap many robo-gym envs in a SubprocVecEnv, like in the following snippet - please just take it as a cheap example, not a recommendation.

Notes:

from multiprocessing import freeze_support
import gym
import robo_gym
from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import SubprocVecEnv

if __name__ == '__main__':
    freeze_support()

    num_envs = 2
    env_ids = ['NoObstacleNavigationMir100Sim-v0'] * num_envs
    target_machine_ip = "127.0.0.1"
    envs = SubprocVecEnv([lambda:gym.make(env_id, ip=target_machine_ip, gui=True) for env_id in env_ids])

    model = PPO('MlpPolicy', envs, verbose=1)
    model.learn(total_timesteps=1000)
    model.save("PPO_mir_from_parallel")

    envs.close()