Right now, we train for N iterations. However, we also set the epsilon-greedy decay assuming that I will actually see N iterations. In a 2 player game, I only ever see N/2. So the decay is not at all what the user expects. Even worse with N > 2 players.
Right now, we train for N iterations. However, we also set the epsilon-greedy decay assuming that I will actually see N iterations. In a 2 player game, I only ever see N/2. So the decay is not at all what the user expects. Even worse with N > 2 players.