hill-a / stable-baselines

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms
http://stable-baselines.readthedocs.io/
MIT License
4.13k stars 724 forks source link

[question] Monitoring a custom environment #470

Closed Nadavborenstein1 closed 4 years ago

Nadavborenstein1 commented 5 years ago

I am training an A2C algorithm on a custom environment using multiprocessing and SubprocVecEnv as follows:

`env = SubprocVecEnv([lambda: CustomEnv(args, i) for i in range(args.cpus)])

model = A2C(MlpLnLstmPolicy, env, verbose=1, tensorboard_log=None, learning_rate=7e-3, lr_schedule="linear")

model.learn(total_timesteps=args.training_steps, log_interval=10)

I want to monitor the learning and save model checkpoints using a Monitor and callbacks, however I can't seem to figure out how to combine everything. I've tried doing

`env = SubprocVecEnv([lambda: CustomEnv(args, i) for i in range(args.cpus)]) env = Monitor(env, log_dir, allow_early_resets=True)

model = A2C(MlpLnLstmPolicy, env, verbose=1, tensorboard_log=None, learning_rate=7e-3, lr_schedule="linear")

model.learn(total_timesteps=args.training_steps, log_interval=10, callback=callback)`

but I get the following exception:

Traceback (most recent call last): File "/main_file.py", line 96, in env = train(args) File "/main_file.py", line 76, in train env = Monitor(env, log_dir, allow_early_resets=True) File "/miniconda3/envs/RL/lib/python3.7/site-packages/stable_baselines/bench/monitor.py", line 27, in init Wrapper.init(self, env=env) File "/miniconda3/envs/RL/lib/python3.7/site-packages/gym/core.py", line 210, in init self.reward_range = self.env.reward_range AttributeError: 'SubprocVecEnv' object has no attribute 'reward_range'

So what is the correct way of using a monitor in this setting?

araffin commented 5 years ago

Hello,

Please follow the example in the documentation. The Monitor wrapper applies to a gym env, not a VecEnv (you need to wrap each env of the VecEnv with monitor), you can find a complete example in the rl zoo.

Maybe it could be a good idea to have a VecMonitorWrapper as it is done in the baselines (https://github.com/openai/baselines/blob/master/baselines/common/vec_env/vec_monitor.py), feel free to submit a PR for that.

jbulow commented 4 years ago

When following the example by applying the Monitor on the actual env and not the VecEnv it seems that the resulting log file gets corrupted. The callback fails with:

Traceback (most recent call last): File "RoomControl_SubProcCallback.py", line 86, in <module> model.learn(total_timesteps=total_timesteps, callback=callback) File "/big/openai/stable-baselines/stable_baselines/ppo2/ppo2.py", line 400, in learn if callback(locals(), globals()) is False: File "RoomControl_SubProcCallback.py", line 48, in callback x, y = ts2xy(load_results(log_dir), 'timesteps') File "/big/openai/stable-baselines/stable_baselines/bench/monitor.py", line 180, in load_results data_frame = pandas.read_csv(file_handler, index_col=None) File "/big/innovation/venv/lib/python3.7/site-packages/pandas/io/parsers.py", line 685, in parser_f return _read(filepath_or_buffer, kwds) File "/big/innovation/venv/lib/python3.7/site-packages/pandas/io/parsers.py", line 463, in _read data = parser.read(nrows) File "/big/innovation/venv/lib/python3.7/site-packages/pandas/io/parsers.py", line 1154, in read ret = self._engine.read(nrows) File "/big/innovation/venv/lib/python3.7/site-packages/pandas/io/parsers.py", line 2059, in read data = self._reader.read(nrows) File "pandas/_libs/parsers.pyx", line 881, in pandas._libs.parsers.TextReader.read File "pandas/_libs/parsers.pyx", line 896, in pandas._libs.parsers.TextReader._read_low_memory File "pandas/_libs/parsers.pyx", line 950, in pandas._libs.parsers.TextReader._read_rows File "pandas/_libs/parsers.pyx", line 937, in pandas._libs.parsers.TextReader._tokenize_rows File "pandas/_libs/parsers.pyx", line 2132, in pandas._libs.parsers.raise_parser_error pandas.errors.ParserError: Error tokenizing data. C error: Expected 3 fields in line 3, saw 5

The contents of 'monitor.csv' is at the moment of failing:

#{"t_start": 1574695681.5302866, "env_id": "RoomControl-v2"} 11398.178784,7200,121.623053131400.975874,7200,178.948676 What should the format of monitor.csv be?

araffin commented 4 years ago

@jbulow Please follow the rules and open if needed an issue with the issue template completely filled. Note that it seems the error comes from your custom env. Please double check it. As mentioned in the readme, we don't do tech support nor consulting so your issue will be closed if it does not comes from stable baselines.

EDIT: it seems you are using one monitor file for multiple envs, this won't work

jbulow commented 4 years ago

I could not find any documentation for Monitor other than the source. As you say, Monitor does not support multiple envs with common log file but does not fail when one tries to do that. I guess opening the log file with O_EXCL might fix the problem. Should I create an issue for this or does it work as intended? With the current design it's not straight-forward how to implement "save-new-best-model" in a vectorized setup.

Disclaimer: I'm very new to all this and thought that giving feedback was a way to improve stable baselines. In this context I don't have a clear understanding of what is considered "technical support". Point taken regarding opening issues using the template. (github suggested this defect when I was about to open a new issue)

EDIT: after reading the actual source code for load_results I found that the documentation is not correct. load_results loads all files ending in 'monitor.csv' from the given path and not just "Load results from a given file" as the documentation states. Now everything works fine!

ajtanskanen commented 4 years ago

I also ran into similar problems with monitoring vectorized environments. It was straight forward to tweak VecMonitor from OpenAI Baselines to work with Stable Baselines, as suggested above. This is what I ended up with.

from stable_baselines.common.vec_env import VecEnvWrapper
import numpy as np
import time
from collections import deque
import os.path as osp
import json
import csv

class VecMonitor(VecEnvWrapper):
    EXT = "monitor.csv"

    def __init__(self, venv, filename=None, keep_buf=0, info_keywords=()):
        VecEnvWrapper.__init__(self, venv)
        print('init vecmonitor: ',filename)
        self.eprets = None
        self.eplens = None
        self.epcount = 0
        self.tstart = time.time()
        if filename:
            self.results_writer = ResultsWriter(filename, header={'t_start': self.tstart},
                extra_keys=info_keywords)
        else:
            self.results_writer = None
        self.info_keywords = info_keywords
        self.keep_buf = keep_buf
        if self.keep_buf:
            self.epret_buf = deque([], maxlen=keep_buf)
            self.eplen_buf = deque([], maxlen=keep_buf)

    def reset(self):
        obs = self.venv.reset()
        self.eprets = np.zeros(self.num_envs, 'f')
        self.eplens = np.zeros(self.num_envs, 'i')
        return obs

    def step_wait(self):
        obs, rews, dones, infos = self.venv.step_wait()
        self.eprets += rews
        self.eplens += 1

        newinfos = list(infos[:])
        for i in range(len(dones)):
            if dones[i]:
                info = infos[i].copy()
                ret = self.eprets[i]
                eplen = self.eplens[i]
                epinfo = {'r': ret, 'l': eplen, 't': round(time.time() - self.tstart, 6)}
                for k in self.info_keywords:
                    epinfo[k] = info[k]
                info['episode'] = epinfo
                if self.keep_buf:
                    self.epret_buf.append(ret)
                    self.eplen_buf.append(eplen)
                self.epcount += 1
                self.eprets[i] = 0
                self.eplens[i] = 0
                if self.results_writer:
                    self.results_writer.write_row(epinfo)
                newinfos[i] = info
        return obs, rews, dones, newinfos

class ResultsWriter(object):
    def __init__(self, filename, header='', extra_keys=()):
        print('init resultswriter')
        self.extra_keys = extra_keys
        assert filename is not None
        if not filename.endswith(VecMonitor.EXT):
            if osp.isdir(filename):
                filename = osp.join(filename, VecMonitor.EXT)
            else:
                filename = filename #   + "." + VecMonitor.EXT
        self.f = open(filename, "wt")
        if isinstance(header, dict):
            header = '# {} \n'.format(json.dumps(header))
        self.f.write(header)
        self.logger = csv.DictWriter(self.f, fieldnames=('r', 'l', 't')+tuple(extra_keys))
        self.logger.writeheader()
        self.f.flush()

    def write_row(self, epinfo):
        if self.logger:
            self.logger.writerow(epinfo)
            self.f.flush()        
migudmigu commented 3 years ago

@ajtanskanen Will you marry me?

araffin commented 3 years ago

btw, VecMonitoris now included in SB3: https://github.com/DLR-RM/stable-baselines3

balisujohn commented 3 years ago

Many thanks @ajtanskanen.

Demetrio92 commented 2 years ago

So in SB3 simply:

from stable_baselines3.common.vec_env import VecMonitor
env = VecMonitor(env, log_dir)

instead of

from stable_baselines3.common.monitor import Monitor
env = Monitor(env, log_dir)   # won't work with vectorized enviroments, will throw cryptic errors

sorry for off-topic, googled the error and got here

zarifaziz commented 11 months ago

@araffin do you know which SB3 library version VecMonitor was introduced in? I'm using 2.0.0 but I still get monitoring errors.