miyosuda / unreal

Reinforcement learning with unsupervised auxiliary tasks
Other
416 stars 131 forks source link

Parameter configuration nav_maze_static_01 #8

Closed endymion64 closed 7 years ago

endymion64 commented 7 years ago

I tried to reproduce your learning curve for nav_maze_static_01, but after 10 million steps it hasn't learnt a lot whilst your curve seems way better. Did you use a specific parameter configuration different from the configuration in your commits to generate this curve and if so, would you mind sharing these?

miyosuda commented 7 years ago

Thank you for using my code.

I think latest committed setting should work well for 'nav_maze_static_01'.

Please confirm,

GAMMA = 0.99
GAMMA_PC = 0.9

in constants.py

https://github.com/miyosuda/unreal/blob/master/constants.py#L21-L22

and, confirm that env process steps is 4 in lab_environment.py

https://github.com/miyosuda/unreal/blob/master/environment/lab_environment.py#L38

(With num_steps=1 setting, 'seekavoid_arena_01' level and 'stairway_to_melon' level was OK, but 'nav_maze_static_01' wasn't learnt well. So I changed it to 4.)

And if you are using GPU rendering with

 bazel run //unreal:train --define headless=glx

'headless=glx' option, please confirm that display sleep is disabled. (If display goes sleeping, frame image seems not updated properly with GPU rendering mode.)

(Or try CPU rendering with 'headless=osmesa')

endymion64 commented 7 years ago

Thank you very much for the quick reply!

I verified that the values in constants.py were the same. I used GPU rendering and display sleep has been disabled. Process steps in lab_environment.py was also set to 4.

However, I noticed that you commented the 'look up' and 'look down' action in the ACTION_LIST of lab_environment.py

https://github.com/miyosuda/unreal/blob/master/environment/lab_environment.py#L62-L63

I didn't comment these actions during my run, so maybe that might have been the problem.

I will let you know if running it again with 6 actions instead of 8 did the trick.

miyosuda commented 7 years ago

However, I noticed that you commented the 'look up' and 'look down' action in the ACTION_LIST of lab_environment.py

Ah, sorry for forgetting about this. Yes, please comment out these two actions.

And if you are using 8 actions, the commit is bit old. My latest code uses multiprocessing for running lab environment.

https://github.com/miyosuda/unreal/blob/master/environment/lab_environment.py#L18-L51

The reason why I changed like this was that when lab environments are running with threads, every time one environment finishes episode, it reloads the map data, and then one thread stops other thread. (Map loading runs exclusively within threads.) So I introduced multiprocessing to make learning faster. And GPU rendering does not work well with multiple lab environments within threads.

So I changed like that.

endymion64 commented 7 years ago

And if you are using 8 actions, the commit is bit old. My latest code uses multiprocessing for running lab environment.

I was working on a personal branch and hadn't merged your latest commits entirely apparently, which is why I was still using 8 actions.

As the algorithm is currently running, I already notice that it is indeed learning better than before, using only 6 actions. The number of actions does influence the learning performance a lot.

Thanks for explaining your reasoning behind the introduction of multiprocessing. The algorithm can now run a couple of times faster with hardware rendering enabled properly!