Open korroktheslavemaster opened 6 years ago
Hi, I've implemented your inverted gradients approach in tensorflow and been able to approach the ball with parameterized actions. However I was not able to learn kicking... I still have been unable to get any progress with your code, I see that gradients an l2 norm are very high as in the above comment, is this to be expected?
Sorry to hear that the approach is giving you trouble. In general high gradients like this are not a good indication.
On Tue, Mar 20, 2018 at 11:54 PM, Arpit Tarang Saxena < notifications@github.com> wrote:
Hi, I've implemented your inverted gradients approach in tensorflow and been able to approach the ball with parameterized actions. However I was not able to learn kicking... I still have been unable to get any progress with your code, I see that gradients an l2 norm are very high as in the above comment, is this to be expected?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mhauskn/dqn-hfo/issues/30#issuecomment-374847349, or mute the thread https://github.com/notifications/unsubscribe-auth/AABNOZSZMphHtNxv-8pXaa1hOoZSc1-Mks5tgfklgaJpZM4RQADv .
I'm trying to run the test job
./bin/dqn -save state/test -alsologtostderr --nogpu
on the latest version of HFO without a gpu. Even after 2000 iterations I'm not seeing any improvement in episode reward. Is this some issue with the HFO version I'm using? Some sample output after ~1500 iterations: