suragnair / alpha-zero-general

A clean implementation based on AlphaZero for any game in any framework + tutorial + Othello/Gobang/TicTacToe/Connect4 and more
MIT License
3.82k stars 1.02k forks source link

Pi output in tensorflow is incorrect: missing softmax layer. #214

Open josey4869 opened 4 years ago

josey4869 commented 4 years ago

In othello.tensorflow.OthelloNNet.py:

The pi output of the model is

self.pi = Dense(s_fc2, self.action_size) (line 36)

which is incorrect. In fact, it should be

self.pi = tf.nn.softmax(Dense(s_fc2, self.action_size)).

By contrast, the implementation in Keras is correct. In othello.keras.OthelloNNet.py:

self.pi = Dense(self.action_size, activation='softmax', name='pi')(s_fc2) (line 28).

Pull request: https://github.com/suragnair/alpha-zero-general/pull/215#issue-481657069

goshawk22 commented 4 years ago

The softmax is applied in the line after and the layer is assigned to the variable self.prob which is the used instead of self.pi in the Nnet file.

josey4869 commented 4 years ago

The softmax is applied in the line after and the layer is assigned to the variable self.prob which is the used instead of self.pi in the Nnet file.

Thanks for the reply! According to line 48 & line 131:

self.loss_pi = tf.losses.softmax_cross_entropy(self.target_pis, self.pi),

It seems self.pi instead of self.prod is used for computing pi_loss? So I guess we should either assign a softmax layer before outputting self.pi or feed self.prod instead of self.pi into pi_loss?

goshawk22 commented 4 years ago

Yes I think it might make more sense to just change it to add the softmax layer to self.pi to avoid confusion.

rlronan commented 3 years ago

Softmax crossentropy expects logits in tensorflow, so softmax should not be applied to pi directly. https://www.tensorflow.org/api_docs/python/tf/compat/v1/losses/softmax_cross_entropy