kkspeed / chess

Chinese chess with bot
MIT License
1 stars 1 forks source link

New model structure #1

Open thekingofkings opened 5 years ago

thekingofkings commented 5 years ago

Deep Q-learning model (DQN) denoted as f.

  1. Learn: f( state, action ) = future potential reward
  2. Given state s_i, a possible set of actions A = {a_k}. Use f to evaluate A, and take the best one.
kkspeed commented 5 years ago

I think reward should be in range -1, 1 instead of 0, 1.. so maybe tanh should be used instead of sigmoid?

https://github.com/kkspeed/chess/blob/master/model_v2.py#L34