Closed schwab closed 3 years ago
Note, after the above error, every Loss value is NaN, so the training is effectively dead at that point.
This problem went away when I reduced the learning rate to something reasonable (like .005).
Follow up: I've also learned that in my particular problem space, the game play speed muzero is learning is realtime and I only have 2 instances of the game engine to learn on at a time. Thus, in the beginning, the replay buffer only has a few games to analyze. Over time, the bufffer fills, so setting the batch_size to something like 100 or more helps prevent the training loop from overtraining on the small number of available games in the early training epochs.
Recently while training I started getting the following errors.
My current replay buffer settings are....