Closed danphan closed 1 year ago
Hi @danphan, glad you are enjoying the book. Unfortunately, there is a mistake in the listings in chapter 7: the sample output doesn't match the settings in the code listing. This page has a full explanation: https://kferg.dev/posts/2021/deep-learning-and-the-game-of-go-training-results-from-chapter-7/
Let me know if that helps
Thanks for your response, Kevin!
I'm glad to hear that I didn't massively screw things up somehow. After simply bumping up num_games and going for more epochs, I have found that the loss and accuracy have improved dramatically. For any other people reading this, I find that the most important factor for me was simply switching from using sgd to adam for the optimizer. While adagrad is better than plain sgd for optimization, I find both to be extremely slow compared to adam.
Hi @macfergus and @maxpumperla,
Thanks for writing such a lovely book! I have some (hopefully) quick questions related to the discussion in Ch. 7, where you walk us through how to train a CNN to play go. To be specific, on page 168 in Ch. 7, the results of training a CNN (using a one-plane encoder) on a sample of 100 games are presented, the results of which I reproduce below:
My first question is, why are there so many steps per epoch? If we are using 100 games, and each game takes around 100 to 200 moves, then there should be on the order of 10,000 moves/instances in our training set. In the code, the batch size is 128, so we expect 10,000/128 ~ 100 steps per epoch, right? When I try to reproduce these results, tensorflow shows me 77 steps per epoch rather than 12,288, which is more in line with my expectations. Instead, the number of steps seems to be on the order of the number of moves in the training set (which I can only see happening if one doesn't explicitly set the steps_per_epoch argument in model.fit().)
My second question is related to this epoch question. On page 167, it says
However, when training my network, I find that going through one epoch takes less than a minute. I assume that this has to do with the discrepancy mentioned above (77 vs 12,288) steps per epoch.
Lastly, I find that my accuracy is substantially smaller than what is reported in the book. For reference, here is the output of my code, where I've trained (what should be) the same model:
The main difference is that I've modified the code to work with tensorflow 2; however all changes were very minimal, and I don't believe this is the main concern.
Thanks for the help! Dan