Performance worse than README claims

openai / universe-starter-agent

A starter agent that can solve a number of universe environments.

MIT License

1.1k stars 318 forks source link

Performance worse than README claims #52

Closed tlbtlbtlb closed 7 years ago

tlbtlbtlb commented 7 years ago

README says that for NeonRace:

Getting 80% of the maximal score takes between 1 and 2 hours with 16 workers, and getting to 100% of the score takes about 12 hours. Also, flash games are run at 5fps by default, so it should be possible to productively use 16 workers on a machine with 8 (and possibly even 4) cores.

I get about 35000 in NeonRace after 6 hours with 8 workers. Someone else gets 90000 points with his branch of vnc-agents after 5M steps.

8 workers brings a 8-core machine (like tlb-0.devbox.sci) down to 0% idle, and the frame rate varies between 3.5 and 4.8 when we requested 5.

tlbtlbtlb commented 7 years ago

With 8 workers it wasn't keeping up with the frame rate on an 8-core c4.2xlarge. With 6 workers, it gets essentially full frame rate:

It still doesn't get beyond 35000 points with 6 workers for 24 hours:

I'll try again with distributed remotes and 16 workers.

pursuenp commented 7 years ago

I have this problem too. For neonrace the episode reward saturates at 35000 with 8 workers . And for vncpong, it learns very slow (6 workers and reaction time is 50-60ms). However the local pong works well, it reached ~20 at around 1.5M step with 8 workers.

realdoug commented 7 years ago

I'm having he same issue. I hosted 4 vnc workers on a 2012 MacBook Pro w/ maxed out RAM and a new SSD and hosted the a3c environment on an ubuntu laptop with similar specs on a wired network. Ping between the machines results in 0.1 ms network latency but I saw closer to 100ms "reaction time" on the workers. Also tried running everything locally on the ubuntu machine w similar results.

Wondering if the MBP just doesn't have the horespower to run the vnc environments

tlbtlbtlb commented 7 years ago

100 mS reaction time is about right. Reaction time includes 1-2 60 Hz frame delays, plus the time for the agent's neural net.

realdoug commented 7 years ago

got it, thanks. So you'd expect it to learn within a few hours with 100ms reaction time? Fwiw, i cut it down to two environments/agents and am getting much better results. reaction time in the 30-50 range and solid learning improvement.

crci commented 7 years ago

I am also observing a maximum score of around 35K points. And it seems that the agent is reaching the score essentially just pressing the forward key (as noted in #45 ). Did anyone got beyond this score with the agent actually making turns? If so, how?

The reaction time is ~70ms. Although I see many of this lines universe-G8yTw7-0 | [2017-04-05 16:18:17,458] [INFO:universe.wrappers.logger] Stats for the past 5.00s: vnc_updates_ps=34.4 n=1 reaction_time=None observation_lag=None action_lag=None reward_ps=0.0 reward_total=0.0 vnc_bytes_ps[total]=216427.6 Is this a problem or it is that the agent just crashed or something like that?

Thank you

remoba commented 7 years ago

I am experiencing the same issue, the agent gets to a maximum score of about 35k - 40k points and only presses the forward key after a while. Tried it numerous times with different configurations, the last one being with only 1 worker. Has anybody successfully trained a model? Can I get a copy of the trained model for testing?

realdoug commented 7 years ago

I was able to successfully train a model. The only thing I did differently was this:

a standard 16GB RAM machine like a MacBook Pro can host a maximum of 2 vnc environments
my home wifi network had enough latency that I needed to wire all machines to the network

I used 3 machines in total, hosting 2 vnc workers each on 2 MBP laptops and training 4 agents on an ubuntu machine of similar specs. Took something like 4 hours. The agent was able to win 20-0 against the VNC environment consistently. Unfortunately I do not still have the trained weights otherwise I'd share.

dtcarls commented 6 years ago

I have read through this ticket and #45 both being closed without an actual explanation of resolution. I am running 3 workers on a 16 core machine with sub 100ms reaction time and they display the behavior of maxing out around 35k score in NeonRace. I have checked that all workers are running properly and not getting stuck in the bugged state of being outside the map with no road and infinitely going straight racking up score.