I observed that, in Keepaway mode, the fact that the ball always gets initialized in the same position (top left corner) might introduce some bias in the policy learned by the players.
For example, I am training the takers against a team of hand-coded keepers (I'm actually using this well-known library). The gif below shows two consecutive episodes, in which you can see that the takers intercept the ball even before the keepers get to pass the ball even once.
To prevent this, the Keepaway mode could benefit from randomizing the ball's initial position when a new episode is restarted.
The way I overcame this problem was by initializing the ball in one of the quadrants occupied by the takers (if I'm not mistaken, Keepaway assumes a minimum of 3 keepers, and so do I).
Feel free to adopt this implementation or initialize the ball in any other way you may find more convenient.
I observed that, in Keepaway mode, the fact that the ball always gets initialized in the same position (top left corner) might introduce some bias in the policy learned by the players. For example, I am training the takers against a team of hand-coded keepers (I'm actually using this well-known library). The gif below shows two consecutive episodes, in which you can see that the takers intercept the ball even before the keepers get to pass the ball even once.
To prevent this, the Keepaway mode could benefit from randomizing the ball's initial position when a new episode is restarted. The way I overcame this problem was by initializing the ball in one of the quadrants occupied by the takers (if I'm not mistaken, Keepaway assumes a minimum of 3 keepers, and so do I).
Feel free to adopt this implementation or initialize the ball in any other way you may find more convenient.