junxiaosong / AlphaZero_Gomoku

An implementation of the AlphaZero algorithm for Gomoku (also called Gobang or Five in a Row)
MIT License
3.27k stars 964 forks source link

softmax 中减去prob最大值的目的 #17

Closed xueyingliu closed 6 years ago

xueyingliu commented 6 years ago

您好,蒙特卡罗树部分输出的probs经过了softmax,然后每个prob都减去了max值: probs = np.exp(x - np.max(x)) 请问减去max值是为了防止结果溢出吗?

junxiaosong commented 6 years ago

因为后面还有 probs /= np.sum(probs), 所以在数学上减和不减是等价的,这边减去max值是为了数值稳定性

xueyingliu commented 6 years ago

@junxiaosong 好的,谢谢,我在训练时候不减最大值确实是出现了结果溢出的问题。