mpnunez / Connect4-AI

Training an AI Player to play Connect4
0 stars 0 forks source link

Reduce tensorboard sampling #11

Closed mpnunez closed 2 months ago

mpnunez commented 2 months ago

We don't need to record smoothed reward after each episode. Also, make sure all histograms (e.g. target network weight updates) are sampled sparsely.

mpnunez commented 2 months ago

The win/loss/tie logging is also wrong because non-terminal states with TD discounted rewards where final state is None are diluting the reward buffer and not being counted as 0, 1, or -1.

mpnunez commented 2 months ago

https://github.com/mpnunez/Connect4-AI/pull/13