Open damienlancry opened 6 years ago
I am really struggling with how to use these functions, but they would be really useful ... Does anybody know how to use them properly?
This happens to me when I log results into same folder. just check if many .monitor files are in the same folder
Hi thanks for your answer, Unfortunately I dont think my problem is related to that because I let the default log directories , that is tmp/openai-<date+hour> and as a consequence there is always only one log results per directory. :(
Could be because we need at least a hundred episodes per agent and the training only lasts 1M timesteps for all agents if my understanding is correct.
100 episode can be changed in result_plotter you can set it to even 1. HalfCheetah is long episode environment try Hopper. if there are less episodes it gives negative dimensions are not allowed error, your error is from .monitor files it can not parse them since you may use same monitor file more than once(if 8 env all write to same .monitor.csv) and it overrides , check number of rows in .monitor.csv file it should be 3 rows you have 5 hence it can not parse.
yes it is the monitor.csv that is very messy. I think it s not good with multiprocessing on mujoco environments which is a pity. But maybe it s just me not doing it right.
Just found out why this is doing this on Mujoco Envs (At least Reacher, HalfCheetah) and not on Atari Envs (At least Pong, Breakout).
This is because in Mujoco Envs, there is a termination condition based on the time taken to complete the episode. As a consequence at least at the beginning of training, every parallel envs have their first done = True at the same time and thus tries to write in 0.monitor.csv all at once. Which results in something like this (opening with vim):
#{"env_id": "Reacher-v2", "t_start": 1534953242.656185}
r,l,t^M
-90.540712,50,1.837869^M
-103.335179,50,1.937895^M
-86.699783,50,2.05136^M
-112.203586,50,2.159039^M
-112.168917,50,2.255508^M
-128.83833,50,2.354578^M
-108.774769,50,2.457605^M
-1-109.082413,50,2.686389-119.676026,50,2.654104^M
-111.755901,50,2.750567---85.499553,50,2.849706^M--93.675179,50,2.955392^M
-104.237215,50,3.063637^M
On the contrary Atari Envs have more variance as far as length of episodes are concerned and done = True is probably very seldom achieved at the same timestep by two different envs. Which results in something like this (still opening with vim):
#{"env_id": "HalfCheetah-v2", "t_start": 1534439643.5216653}
r,l,t^M
-255.917874,1000,9.828193^M
-240.477985,1000,17.473527^M
-602.461787,1000,25.421956^M
-412.191858,1000,33.092345^M
-449.831362,1000,40.931932^M
-357.177889,1000,48.556288^M
-210.854723,1000,56.315493^M
-347.997157,1000,64.064439^M
749.733397,1000,72.024559^M
-327.658974,1000,79.714595^M
-270.476361,1000,87.379286^M
-406.112755,1000,95.23366^M
-444.303552,1000,102.948899^M
-479.953277,1000,110.714009^M
-79.515011,1000,118.378877^M
-417.356146,1000,126.207028^M
-390.424013,1000,133.969844^M
I still do not know how to fix this but am working on it, if somebody has any suggestion, please let me know :)
EDIT: Just realized the last output is from HalfCheetah that i ran not using --num_env=8 (default is 1 on mujoco envs) Could not found previous logs of Atari so reran one just now and interestingly there are actually 8 files monitor.csv. (due to MPI.COMM_WORLD.Get_rank() ) So now Have to figure out why there is only one csv on mujoco envs (there should be 8 of them when using MPI.COMM_WORLD.Get_rank()).
Please am having similar problem, where to get monitor.csv file please
python -m baselines.results_plotter --dirs=/tmp/openai-2018-11-30-16-53-40-674939
Traceback (most recent call last):
File "/home/gbenga/Downloads/abiona1008/envs/tensorflow/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/gbenga/Downloads/abiona1008/envs/tensorflow/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/gbenga/baselines/baselines/results_plotter.py", line 95, in
Dear OpenAI and Community,
I am trying to figure out how to use the results_plotter function. I have run a simple test with the atari environment Pong by running
and then I ran
And it works perfectly. But then I tried:
then:
And I get the following error:
At first i thought it was because i used the argument num_env=8 but then i realized that atari uses the num_env= multiprocessing.cpu_count() by default. So any idea where this is coming from?
Cheers!
EDIT: it works perfectly on mujoco with num_env = 1 (on Reacher environment) I really think the Parallel monitored environments are using the csv.DictWriter().writerow() function at the same time which is messing with the monitor.csv file. But I can t figure out why it s doing this on mujoco environments and not on atari environments. Anyway the writerow function should be protected by mutex or semaphores (I m not a pro of multiprocessing so i don t know if it is the right terminology). I m going to try to do something about it.