openai / baselines

OpenAI Baselines: high-quality implementations of reinforcement learning algorithms
MIT License
15.83k stars 4.88k forks source link

No Monitor Files for TRPO and DeepQ #1008

Open Lantc26 opened 5 years ago

Lantc26 commented 5 years ago

Hello, executing python -m baselines.run --alg="deepq" --env="QbertNoFrameskip-v4" --num_timesteps="1e4" --log_path="~/logs/" will not produce any monitor.csv files. Only trpo and deepq are affected by this. As a result, I am not able to load the results with plot_util.load_results for those algorithms.

As you may know, the environments for trpo and deepq are created in another way, than for acer, a2c, ppo2 etc. In the method build_env of run.py a direct call is made to make_env of the cmd_util.py file. Those calls do not set the logger_dir. As a result, the results_writer of the Monitor wrapper will not be set, resulting in no monitor.csv files.

Adding logger_dir=logger.get_dir() will generate the mentioned files:

if alg == 'deepq':
      env = make_env(env_id, env_type, seed=seed, wrapper_kwargs={'frame_stack': True}, logger_dir=logger.get_dir())
elif alg == 'trpo_mpi':
      env = make_env(env_id, env_type, seed=seed, logger_dir=logger.get_dir())
Lantc26 commented 5 years ago

The suggested solution does not work for trpo running with mpirun and multiple processes. The processes of mpirun write to the same file. As a result, the monitor file is not correct. An example of such a monitor file:

# {"t_start": 1569413972.5753684, "env_id": "QbertNoFrameskip-v4"} 
r,l,t
00.0,282,12.64331
33375.0,447,28.44475
1100.0,322,31.35991211175.0,320,44.6081370000.0,279,60.217223
550.0,309,63.1802377775.0,318,75.82545210100.0,321,79.1134111100.0,318,91.6012877775.0,304,103.706595
225.0,282,106.36101111100.0,338,118.100313
00.0,279,120.77101111100.0,344,132.62252222225.0,315,144.61161111100.0,322,147.950.55550.0,325,159.19688202200.0,329,162.3927922525.0,304,174.108912225.0,284,176.95073111175.0,299,189.429527775.0,307,201.657622
0.0,278,204.8440477575.0,331,219.15183620200.0,388,223.12544122225.0,374,235.1316665050.0,268,246.554494
250.0,360,250.39406112125.0,295,262.056617
100.0,277,265.0457722225.0,313,276.88973
775.0,309,288.555084
100.0,276,291.100978

Changing the call to the following, will prevent this and produces for each process its own file.

elif alg == 'trpo_mpi':
     env = make_env(env_id, env_type, seed=seed, logger_dir=logger.get_dir(), mpi_rank=logger.get_rank_without_mpi_import())
christopherhesse commented 5 years ago

Could you make a PR fixing this? Thanks!