openai / universe-starter-agent

A starter agent that can solve a number of universe environments.
MIT License
1.1k stars 318 forks source link

No tangible results with pong or NeonRace within VNC environment #127

Closed dtcarls closed 6 years ago

dtcarls commented 6 years ago

I have read through #52 and #45 both being closed without an explanation of resolution. I have run many iterations of NeonRace and gym-core.PongDeterministic-v0 over a couple weeks utilizing many different configurations of workers and vnc containers. pong-vnc-dist-tf0 12 1 This screenshot is has been running about 8 hours with 8 workers on a processing server and 8 VNC environments on another server locally networked (0.1ms pings), both have 16 cores and haven't exceed a load average of 15 on either machine. I have tried with tf0.12, tf0.12.1, tf1.2.0, tf1.2.1, tf1.3.0 all with the same results and utilizing the version of all the dependencies listed in the README (you can see my install instructions here to verify: https://gist.github.com/dtcarls/03c23fca2fba804cfd4ac99ef4780a29). I continue to max out around 35k score in NeonRace (just hold forward essentially) and episode reward average of about 0 in pong. I have checked that all workers are running, I get reaction times sub-150ms, not a lot of reaction_time:None responses (see my poor attempt at a script here: https://github.com/dtcarls/universe-checkpoints/blob/master/reaction_time.sh). I have not been able to get either of these games to work in a VNC environment. I have been able to get PongDeterministic-v4 without VNC to work fine and quickly. I feel I am checking all of the diagnostic boxes yet cannot replicate the results of this repository, even with a longer timeline given that I am using 8 (or less) workers vs 16 workers. If someone could point me in the right direction I would be grateful.

dtcarls commented 6 years ago

Update. 100 million steps made it to about +7. That is 8 workers running for 2.5 days (64.5 hours) While the readme claims: "If you run this experiment on a high-end MacBook Pro, the above job will take just under 2 hours to solve Pong." Now I understand those numbers are for non-VNC version and using a different core count, but to extrapolate a laptop in a non-VNC environment vs 2 servers with 16 cores each in a VNC environment getting a score of 7 over 32 times as long doesn't seem to add up. untitled

Again, any assistance in finding what I may be doing wrong, or if this repo is simply out of date, would be much appreciated.

AdamStelmaszczyk commented 6 years ago

but to extrapolate a laptop in a non-VNC environment vs 2 servers with 16 cores each in a VNC environment getting a score of 7 over 32 times as long doesn't seem to add up

I think you can't extrapolate like that, VNC environment has much greater lag, which has a big impact.

From README:

Generally speaking, environments that are most affected by lag are games that place a lot of emphasis on reaction time. For example, this agent is able to solve VNC Pong (gym-core.PongDeterministic-v3) in under 2 hours when both the agent and the environment are co-located on the cloud, but this agent had difficulty solving VNC Pong when the environment was on the cloud while the agent was not.

Perhaps it's just not possible to solve Pong with this implementation and lag big enough.