suragnair / alpha-zero-general

A clean implementation based on AlphaZero for any game in any framework + tutorial + Othello/Gobang/TicTacToe/Connect4 and more
MIT License
3.84k stars 1.03k forks source link

I don't understand why trainExamplesHistory is not cleared between iterations #251

Open Racines opened 3 years ago

Racines commented 3 years ago

Hello,

I see that the trainExamplesHistory in the Coach.py is never cleared, even when we accept a new model after the pit (line 126). I don't understand why we are keeping the previous training data, where the stored policy (pi) and result value (v) will not be the same if evaluated by the new model. It looks like we are continuing to train the new model with deprecated data.

Can someone explain the reason why?

yunjiangster commented 2 years ago

Using data from earlier iteration could help smooth the training progress and add more diversity, since the earlier models may be only slightly suboptimal compared to the most recent.