Not able to progress in learning?

korroktheslavemaster commented 6 years ago

I'm trying to run the test job ./bin/dqn -save state/test -alsologtostderr --nogpu on the latest version of HFO without a gpu. Even after 2000 iterations I'm not seeing any improvement in episode reward. Is this some issue with the HFO version I'm using? Some sample output after ~1500 iterations:

I0101 11:53:57.222157 29593 dqn_main.cpp:355] [Agent0] Episode 1478 reward = -0.00117409
I0101 11:53:57.345566 29593 sgd_solver.cpp:92] Gradient clipping: scaling down gradients (L2 norm 1.23605e+09 > 10) by scale factor 8.09029e-09
I0101 11:53:57.636891 29593 sgd_solver.cpp:92] Gradient clipping: scaling down gradients (L2 norm 1.07481e+09 > 10) by scale factor 9.30401e-09
I0101 11:53:57.838814 29593 sgd_solver.cpp:92] Gradient clipping: scaling down gradients (L2 norm 1.40991e+09 > 10) by scale factor 7.09265e-09
I0101 11:53:58.040874 29593 sgd_solver.cpp:92] Gradient clipping: scaling down gradients (L2 norm 1.47654e+09 > 10) by scale factor 6.77259e-09
I0101 11:53:58.246521 29593 sgd_solver.cpp:92] Gradient clipping: scaling down gradients (L2 norm 1.52825e+09 > 10) by scale factor 6.54343e-09
I0101 11:53:58.463399 29593 sgd_solver.cpp:92] Gradient clipping: scaling down gradients (L2 norm 1.20801e+09 > 10) by scale factor 8.2781e-09
I0101 11:53:58.706378 29593 sgd_solver.cpp:92] Gradient clipping: scaling down gradients (L2 norm 1.23667e+09 > 10) by scale factor 8.08623e-09
I0101 11:53:58.941028 29593 sgd_solver.cpp:92] Gradient clipping: scaling down gradients (L2 norm 1.38584e+09 > 10) by scale factor 7.21582e-09
I0101 11:53:59.179901 29593 sgd_solver.cpp:92] Gradient clipping: scaling down gradients (L2 norm 4.34253e+08 > 10) by scale factor 2.30281e-08
I0101 11:53:59.376896 29593 sgd_solver.cpp:92] Gradient clipping: scaling down gradients (L2 norm 1.66873e+09 > 10) by scale factor 5.99256e-09
EndOfTrial: 0 / 1580 162979 OUT_OF_TIME

korroktheslavemaster commented 6 years ago

Hi, I've implemented your inverted gradients approach in tensorflow and been able to approach the ball with parameterized actions. However I was not able to learn kicking... I still have been unable to get any progress with your code, I see that gradients an l2 norm are very high as in the above comment, is this to be expected?

mhauskn commented 6 years ago

Sorry to hear that the approach is giving you trouble. In general high gradients like this are not a good indication.

On Tue, Mar 20, 2018 at 11:54 PM, Arpit Tarang Saxena < notifications@github.com> wrote:

Hi, I've implemented your inverted gradients approach in tensorflow and been able to approach the ball with parameterized actions. However I was not able to learn kicking... I still have been unable to get any progress with your code, I see that gradients an l2 norm are very high as in the above comment, is this to be expected?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mhauskn/dqn-hfo/issues/30#issuecomment-374847349, or mute the thread https://github.com/notifications/unsubscribe-auth/AABNOZSZMphHtNxv-8pXaa1hOoZSc1-Mks5tgfklgaJpZM4RQADv .

mhauskn / dqn-hfo

Not able to progress in learning? #30