Open thekingofkings opened 6 years ago
Deep Q-learning model (DQN) denoted as f.
I think reward should be in range -1, 1 instead of 0, 1.. so maybe tanh should be used instead of sigmoid?
https://github.com/kkspeed/chess/blob/master/model_v2.py#L34
Deep Q-learning model (DQN) denoted as f.