Multiple models? Remove model compare?

barakugav commented 1 year ago

When training a Hex5 model something interesting happened: I run two training runs, first with [model_compare][games_num]=30, in which the model learned the game slowly, increasing its policy accuracy from 0.11 to 0.26, switching models each few iterations until iteration 11. From iteration 11 up to iteration 50, the trained model wasn't able to beat the best model (>55%), and the same model continued to produce the training data. The trained model increased its accuraty up to 0.39 but that may be just overfit to the best model which didnt change. I abandoned the run as it seems to me it stopped learning. I run a second training run, and changed [model_compare][games_num]=0, meaning no model comparison is performed and the trained model is always considered to be the best model. The run was very good, and the model achieved policy accuracy of 0.63 after 75 iterations and it plays against me very good. Its interesting as itself that without comparison the learning was successful, but more interesting was that already after 3 iterations the accuracy was 0.48. The reason is, the new run was using the training data of the previous run to learn, and by using already decent quality training data it was able to learn very quickly.

This rise few questions: Does the model comparison is necessary or it blocks us from advancing in some cases? Maybe we should increase the number of games played during comparison to overcome this. Does training new models from scratch in the middle of the run has its benefits? Assuming the real effort is to produce good quality training data, and the training itself is not the heavy compute stage, maybe we should consider multiple models.

logs attached 1122_1459.txt 1122_1740.txt 221122_150030.csv 221122_174034.csv

poja commented 1 year ago

Wow!! It's amazing we solved 5, and in general these experiments are really cool.

What do you think will happen if we increase the interval between comparisons? Not every training. Bc sometimes you need to go down before you go up, or more formally sometines the loss isn't a perfect estimator of the network quality.
About training multiple models, this could be a really good idea, but it introduces a potentially heavy exploration vs exploitation parameter. So I would keep this idea in mind while we go up to larger boards.

barakugav commented 1 year ago

Not sure i understand your suggestion about increasing the interval between comparison. With the current implementation, the trained model is allowed to go down for multiple iterations and comparison: the best model will continue to generate data and the trained model will be fed more more and data until it outplay the best one with at least 55% winning rate. I agree the loss isn't the perfect estimator for the net quallity, but i can't think of any other generic way to measure it. We can do da tournament of models, which will give us a better metric, but that is too complicated and heavy compute demand.

About training multiple models, for generating training data using self play, we always expoit as much as possible, namely we are using only the best model to generate data. Than we have to decide how much models do we maintain simultaneously, when do we throw one to the bin, and when do we create a new one from scratch. Its more complexity, but i think its managable. And interesting!

barakugav commented 1 year ago

I trained Hex7 models, using a new feature where I always maintain multi models. New models are not created during the run, only in iteration 0. Self play games are generated by the best model only. At the end of each iteration the best model plays 32 games against each of the newly trained models, and a new best is declared.

Interestingly, although the policy accuracy was somewhat equal along all models, a significant value accuracy difference was achieve between model 5 and the reset of the models after about 20 iterations. From iteration 20 up to 57, model 5 was always best and no other model ever generated the self play data, see the 'trained model winning rate".

It seems as continuing training a bad model is not the great idea, i think we should try training new models from scratch during the run.

221205_165710.xlsx

poja commented 1 year ago

Wow, really cool!!

I am surprised that there is such a big difference (e.g. in value accuracy) between the best model and the 7 others... with the same training data and the same number of iterations. That means there's a lot of luck in training.
On the other hand, an interesting note - the model that created the training data has a slight advantage over the other models. But more in Policy than Value... And I don't know how big this advantage is.
Something that confuses me about the data you sent - model 5 doesn't always have the highest win rate (e.g. model 4 is better in row 50 of the Excel). So, in what sense was it best?
I agree that re-training from scratch seems like a good idea - the models that were low after ~10 iterations stayed low for the rest of the training.
- One way to do this is after every X iterations, we wipe the weights of the worst model. Do you think this is a good method?
- In this case, what do you think should be the Learning Rate after a wipe? We can just start counting from iteration 0, e.g. learning rate 0.01 then 0.001 then 1e-4 (by default), I think this is simple and should work. But maybe there is a better way to choose Learning Rate, given that the training data is not garbage like in the real first iterations.

barakugav commented 1 year ago

Im not sure why the other models are not performing ad well. Is it some luck of initial weights / first few iterations which determine some 'structure' of the net which sets a limit to its performance? Aka local minimum? If it is, we should increase LR and decrease batch size. Is it too low number of epochs/ LR? Meaning we just dont train enough and dont extract all the juice we can from the traing data.

'Best model' clarification: models = [createmodel() for _ in 0..modelnum] best = models[0] for in 0..iter_num: { selfplay(best) models = [trainmodel(m) for m in models] for m in models: { winning_rate = compare(best, m) if winning_rate >0.55: { best = m } } } I hope the code is clear, but as you can see winning_rate_5 is the winning rate of the TRAINED model 5 compared to best. The 'best' is some older version of the current trained model 5. Also, a model must win 55% of the games to become the best, and I stopped the run when the best wasnt chanaging anymore.

With this whole multi model debate, i think we should understand if the hard part is creating good training data, or is it training the models. Training from scrach is a good idea, but do we want to give it a boost of training to catch up with the other models?

poja commented 1 year ago

Ahh okay I didn't take the >0.55 into account. So now if I understand correctly model 4 became best at around iteration 39 (row number in the Excel) but then model 5 took it back in the next iteration and otherwise 5 was always best.

Good question regarding the boost... I assume some boost will be necessary otherwise it probably won't be good enough before the next competition (and it will be wiped again). An expensive but powerful boost could be - train (1) until loss (or some accuracy measure?) is better than at least one of the other models by some factor, OR (2) until you trained the same number of iterations as the rest of the models, whichever comes first.

poja / Cattus

Multiple models? Remove model compare? #120