Closed mpnunez closed 2 months ago
We don't need to record smoothed reward after each episode. Also, make sure all histograms (e.g. target network weight updates) are sampled sparsely.
The win/loss/tie logging is also wrong because non-terminal states with TD discounted rewards where final state is None are diluting the reward buffer and not being counted as 0, 1, or -1.
None
https://github.com/mpnunez/Connect4-AI/pull/13
We don't need to record smoothed reward after each episode. Also, make sure all histograms (e.g. target network weight updates) are sampled sparsely.