For the shooting training there will be no proxy reward until the agent does hit the puck. Therefore I think it would speed up training and would route the agent towards the puck if there would also be a negative reward if the puck does not move at all. I simply added a <= instead of a < to force this behavior.
For the shooting training there will be no proxy reward until the agent does hit the puck. Therefore I think it would speed up training and would route the agent towards the puck if there would also be a negative reward if the puck does not move at all. I simply added a <= instead of a < to force this behavior.