Closed mr-cloud closed 7 years ago
Although 1 core per worker is enough for the non-VNC version of Pong, for the VNC version you should provide 2 or more cores, depending on the speed. Try it with 2 workers on a 8-core machine and you should get good results. It might work with 4, depending on the machine.
I'm not sure what parameters you're using for NeonRace. Maybe just 2 workers on 2 cores?. Our README only claims something for 16 workers on 8 cores:
Getting 80% of the maximal score takes between 1 and 2 hours with 16 workers, and getting to 100% of the score takes about 12 hours. Also, flash games are run at 5fps by default, so it should be possible to productively use 16 workers on a machine with 8 (and possibly even 4) cores.
You should look at the htop tab of tmux and be sure it's not using more than about 80% of the CPU>
Hi, I have trained Pong again in 4 workers with 8 cores. And the agent is slightly better than before, but is not good enough as expected. The Tensorboard is depicted below.
Is my hardware environment not sufficient to train a full-score agent? Another question is we cannot assume definitely improved performance in cost of more time when training a reinforcement learning model? Thanks a lot. @tlbtlbtlb
The trend on reward is positive, so you might get close to full score after 20M steps.
It's hard to say if compute power is a factor. It depends on the machine, and what else is running on it. Look at reaction_time in the logs, if that's over 40 mS it will struggle to get good scores.
True, performance doesn't always increase with more training. Models can overfit and performance can drop.
According to the README file, I have built an environment with the remote mode in which I trained Pong game in parallel 2 workers for more than ten hours. But the results show that my agent only has tied in the game. The global statistics on tensorboard is depicted below. And the reaction_time is about 40ms.
I have also trained the AI as the demo said in README file for "noen race game" and the result is still not good, see figure below. And the reaction_time is about 80ms.
My training host is an Ubuntu 16.04 LTS VM with two cores. Why cannot I reach a goal as what the demo in README file did? And how can I debug or tune my agent? Any suggestion would be appreciated.