hill-a / stable-baselines

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms
http://stable-baselines.readthedocs.io/
MIT License
4.16k stars 725 forks source link

Multiprocessing not working for PPO1 #1072

Closed eflopez1 closed 3 years ago

eflopez1 commented 3 years ago

Hello,

First off, I want to thank you all for the great work that has been put into stable-baselines. This repository has carried my current project for a while now, and I would be utterly lost had it not been for the help of this repository.

Now, there is a problem that I am facing that I am almost certain is a result of my lack of proper MPI knowledge, but wanted to pose it here before making that final assumption. PPO1 has worked well for a while now on a custom environment dubbed 'Strings', and recently I have attempted to implement multi-processing to increase sample efficiency (the environment is made using a physics simulator that is limited to one core at a time, so multiprocessing would assist by allowing multiple of the same/similar environment to be spawned on multiple cores). I believe that I have setup the code properly, creating a vectorized environment using SubprocVecEnv and passing that to the model.

import numpy as np
#Strings Environment
import Strings_Environment_Small as Strings

# Call back to save function
from stable_baselines.bench import Monitor
from saveBestTrainingCallback import SaveOnBestTrainingRewardCallback
from stable_baselines import PPO1

from stable_baselines.common.policies import MlpPolicy
from stable_baselines.common.vec_env import SubprocVecEnv
import os

savefile = 'C:/parallelRLtest/'
os.makedirs(savefile,exist_ok=True)

callback = SaveOnBestTrainingRewardCallback(check_freq = 100, 
                                            log_dir = savefile, 
                                            num_ep_save = 5)

if __name__ == '__main__':
    env = SubprocVecEnv([lambda: Monitor(Strings.Strings(mem=2,gain=2), savefile) for _ in range(2)])

    model = PPO1(MlpPolicy,env)
    model.learn(total_timesteps = 10000, callback = callback)
    model.save(save_file+'agent_after_training')

Upon executing using MPI the following error occurs: Execution:

mpiexec -np 1 python MultiEnvironmentTest.py

Error:

raise ValueError("Error: the model requires a non vectorized environment or a single vectorized"
ValueError: Error: the model requires a non vectorized environment or a single vectorized environment.

Apologies for this rather base question, and thank you in advance for any guidance.

Miffyli commented 3 years ago

Relevant issue #171

Thanks for the kind words! These help us continue working on these projects :).

As error suggests, MPI algorithms do not support vectorized environments. Your options include changing to PPO2 (which is more mature code) or calling mpiexec -np 8 as suggested here.

Sidenote: I suggest taking a look at stable-baselines3 for cleaner implementations and future support.