mpnunez / Connect4-AI

Training an AI Player to play Connect4
0 stars 0 forks source link

Implement temporal difference learning with adjustable steps #4

Closed mpnunez closed 2 months ago

mpnunez commented 3 months ago

$value_i = \sum$ of discounted rewards for the next $n$ steps + discounted value of state $n$ steps from now. $n$ can be any value between 1 and the number of steps until the end of the episode.

mpnunez commented 2 months ago

https://github.com/mpnunez/Connect4-AI/commit/f848e5166e814553b53787a4a3bc035f2ffef315