Open tokb23 opened 8 years ago
How can you get the result that says it runs 472 steps per sec with 8 threads on CPU? I've just run a3c.py code without any option on Core i7 6700 machine.
4.6 million steps in 11hours seems too slow. Could you tell me cpu usage % of each CPU when you run gym branch? If CPU usage is too low (like 10% or 20%), something in my program might have problem.
When @joabim tried 24 threads 3 months ago, cpu usage was around 80~85%. https://github.com/miyosuda/async_deep_reinforce/issues/1#issuecomment-217446365
I've also merged performance log code to gym branch to show global steps per sec. (Thanks @Itsukara )
Thank you for your replay!
I don't know if I can check the CPU usage of each CPU on AWS, but I checked the average CPU utilization.
And I ran your code again with 36 threads on AWS to see the performance, one of the examples is below.
### Performance : 287344 STEPS in 1797 sec. 160 STEPS/sec. 0.58M STEPS/hour 180 STEPS/sec
It's obviously so slow.
When @joabim tried 24 threads 3 months ago, cpu usage was around 80~85%.
Yeah, I saw it. It seems fine even though it's using threading instead of multiprocessing. That's why I was wondering why...
FYI, the comment in this post (http://stackoverflow.com/questions/34419645/asynchronous-computation-in-tensorflow) said that it's okay to use threading with TensorFlow.
@tatsuyaokubo Thank you for reporting performance on AWS environment. 160 steps/sec is too slow. I have been thinking about this performance decrease.
In my program, all of the computational nodes are explicitly located on "cpu:0". I'm not familiar with AWS environment, but if AWS treats each virtual CPU as different CPU, not like virtual cores in single CPU in our desktop PC, we might need to locate each network on "cpu:0", "cpu:1", .... so on. I'm not sure whether this is correct of not.
By the way, muupan's chainer implementation seems working fine on AWS environment, so it might help.
I tried doing what you suggested, but it didn't work. I also tried running the code without explicitly defining the device (cpu:0), but it didn't work.
Actually I was also referring to his implementation, but I still didn't come up with any ideas. I'll keep trying to find a way to fix this problem.
@tatsuyaokubo I see. Sorry about that. Please let me know if you fix the way to fix this.
Hi @miyosuda! Thank you for your great work!
I was implementing my own A3C-FF and when I ran it, it seemed so slow in terms of 'steps per unit of time' compared to your result. So I tried running your A3C-FF of 'gym' branch with 36 threads on AWS EC2 c4.8xlarge instance, but it still seemed so slow, running about 4.6 millions steps (global steps) with 36 threads took about 11 hours. I have no idea what's going on. How can you get the result that says it runs 472 steps per sec with 8 threads on CPU?
Hope to hear from you soon.
My implementation: https://github.com/tatsuyaokubo/async-rl