The performance is as not good as the expected when running Pong with Remote mode.

openai / universe-starter-agent

A starter agent that can solve a number of universe environments.

MIT License

1.1k stars 318 forks source link

The performance is as not good as the expected when running Pong with Remote mode. #37

Closed mr-cloud closed 7 years ago

mr-cloud commented 7 years ago

According to the README file, I have built an environment with the remote mode in which I trained Pong game in parallel 2 workers for more than ten hours. But the results show that my agent only has tied in the game. The global statistics on tensorboard is depicted below. And the reaction_time is about 40ms.

I have also trained the AI as the demo said in README file for "noen race game" and the result is still not good, see figure below. And the reaction_time is about 80ms. neonrace

My training host is an Ubuntu 16.04 LTS VM with two cores. Why cannot I reach a goal as what the demo in README file did? And how can I debug or tune my agent? Any suggestion would be appreciated.

tlbtlbtlb commented 7 years ago

Although 1 core per worker is enough for the non-VNC version of Pong, for the VNC version you should provide 2 or more cores, depending on the speed. Try it with 2 workers on a 8-core machine and you should get good results. It might work with 4, depending on the machine.

I'm not sure what parameters you're using for NeonRace. Maybe just 2 workers on 2 cores?. Our README only claims something for 16 workers on 8 cores:

Getting 80% of the maximal score takes between 1 and 2 hours with 16 workers, and getting to 100% of the score takes about 12 hours. Also, flash games are run at 5fps by default, so it should be possible to productively use 16 workers on a machine with 8 (and possibly even 4) cores.

You should look at the htop tab of tmux and be sure it's not using more than about 80% of the CPU>

mr-cloud commented 7 years ago

Hi, I have trained Pong again in 4 workers with 8 cores. And the agent is slightly better than before, but is not good enough as expected. The Tensorboard is depicted below. global model

Is my hardware environment not sufficient to train a full-score agent? Another question is we cannot assume definitely improved performance in cost of more time when training a reinforcement learning model? Thanks a lot. @tlbtlbtlb

tlbtlbtlb commented 7 years ago

The trend on reward is positive, so you might get close to full score after 20M steps.

It's hard to say if compute power is a factor. It depends on the machine, and what else is running on it. Look at reaction_time in the logs, if that's over 40 mS it will struggle to get good scores.

True, performance doesn't always increase with more training. Models can overfit and performance can drop.