suragnair / alpha-zero-general

A clean implementation based on AlphaZero for any game in any framework + tutorial + Othello/Gobang/TicTacToe/Connect4 and more
MIT License
3.92k stars 1.04k forks source link

Coach accepting and rejecting new model #88

Closed JernejHabjan closed 5 years ago

JernejHabjan commented 6 years ago

Hi, I might have discovered possible error at the end of coach learn episode when its deciding, whether to discard or keep new model.

if pwins+nwins > 0 and float(nwins)/(pwins+nwins) < self.args.updateThreshold:
    print('REJECTING NEW MODEL')
else:
    print('ACCEPTING NEW MODEL')

There might be possible error, when there are no pwins+nwins, resulting only in draws. Then new model is accepted. I dont think its good to accept new model, because we do not know actually how good this contesting model, that resulted in all draws, actually is. Possible solution would be the following:

if pwins+nwins == 0 or float(nwins)/(pwins+nwins) < self.args.updateThreshold:
    print('REJECTING NEW MODEL')
else:
    print('ACCEPTING NEW MODEL')

This will result in not-dividing by zero and will discard model resulting in draws.

suragnair commented 6 years ago

I agree with that. Would you like to submit a PR or should I fix it at my end?

JernejHabjan commented 6 years ago

Ok cool. Nah, no need for PR. You can push new commit with changes.

JernejHabjan commented 6 years ago

Hey, its me again with updates about checking if new model should be rejected or accepted.

I have found on my game that if number of draws arent checked when accepting new model, model started to overfit itself on draws. I could better explain that with an example:

NEW/PREV WINS : 2 / 0 ; DRAWS : 8 ACCEPTING NEW MODEL

In this case newer model generated more wins than previous, but quite a few more draws occurred. This resulted in new model being accepted, that resulted in more draws than previous model.

I would update that formula with:

if pwins+nwins == 0 or float(nwins)/(pwins+nwins+draws) < self.args.updateThreshold:
    print('REJECTING NEW MODEL')
else:
    print('ACCEPTING NEW MODEL')

Which also adds draws to divider. This will prevent model from accepting draws as ok results in cases where draw scenarios are frequent.

xuehy commented 5 years ago

@JernejHabjan How many iterations do you run before reaching a high wining rate? I modified the accepting rule by adding draws to the denominator following your advice. However, most iterations result in rejections. Can you tell me how many iterations you run and what the best winning rate for NEW/PREV WINS is?

JernejHabjan commented 5 years ago

@xuehy Hey After running some tests on my rts game implementation, I also rarely received model acceptance because of high draw count. This might also occur for well-learnt models, as they might both be playing quite similarly, resulting in high number of draws. I believe this divider pwins+nwins+draws is rejecting too many models, but then again, without checking for draws when choosing model is resulting (in my case) in overfitting on draws. Some more mild condition should be chosen.

xuehy commented 5 years ago

@JernejHabjan In my case the model is also overfitting on draws. I'm thinking about removing some board positions with draws to make the training data balance. We can just discard some of the episodes with draws. Do you think will it work?

xuehy commented 5 years ago

@JernejHabjan Another approach may be changing

if pwins+nwins == 0 or float(nwins)/(pwins+nwins+draws) < self.args.updateThreshold:

into

if pwins+nwins < NOT_DRAW_NUM or float(nwins)/(pwins+nwins) < self.args.updateThreshold:,

where NOT_DRAW_NUM is the minimum number of the games not ending with draws. Perhaps set NOT_DRAW_NUM = 20?

JernejHabjan commented 5 years ago

Hmm good call. This might work, but is too hardcoded. Some kind of percentage of draws should be allowed instead of hardcoded number. But then again, what if draws are more often occuring with more developed players, because they both are playing 'almost perfect' matches?

suragnair commented 5 years ago

@JernejHabjan can you please make a PR with the latest updates on this?

JernejHabjan commented 5 years ago

This issue might get closed, as pull request was merged. If someone finds better solution, where draws are also included, it can be reopened.