I see that the trainExamplesHistory in the Coach.py is never cleared, even when we accept a new model after the pit (line 126).
I don't understand why we are keeping the previous training data, where the stored policy (pi) and result value (v) will not be the same if evaluated by the new model.
It looks like we are continuing to train the new model with deprecated data.
Using data from earlier iteration could help smooth the training progress and add more diversity, since the earlier models may be only slightly suboptimal compared to the most recent.
Hello,
I see that the trainExamplesHistory in the Coach.py is never cleared, even when we accept a new model after the pit (line 126). I don't understand why we are keeping the previous training data, where the stored policy (pi) and result value (v) will not be the same if evaluated by the new model. It looks like we are continuing to train the new model with deprecated data.
Can someone explain the reason why?