junxiaosong / AlphaZero_Gomoku

An implementation of the AlphaZero algorithm for Gomoku (also called Gobang or Five in a Row)
MIT License
3.25k stars 965 forks source link

用tensorflow 训练1000局仍然无法收敛 #48

Open yuan9778 opened 6 years ago

yuan9778 commented 6 years ago

用的是默认的配置:6x6 board and 4 in a row. macos上跑的。

batch i:1100, episode_len:21 kl:0.00058,lr_multiplier:11.391,loss:4.518421649932861,entropy:3.5188446044921875,explained_var_old:0.000,explained_var_new:0.000 current self-play batch: 1100 num_playouts:1000, win: 2, lose: 8, tie:0

请指教。

yuan9778 commented 6 years ago

重新跑了一次,这次收敛很快。第一次training刚开始就跑偏了。loss和entropy居然是增加的。下面是第一次training的最开始几个batch数据:

batch i:4, episode_len:15 kl:0.00467,lr_multiplier:1.500,loss:3.8342599868774414,entropy:3.5811827182769775,explained_var_old:0.001,explained_var_new:0.936 batch i:5, episode_len:15 kl:0.00498,lr_multiplier:2.250,loss:3.617708444595337,entropy:3.574676036834717,explained_var_old:0.934,explained_var_new:0.998 batch i:6, episode_len:18 kl:0.00955,lr_multiplier:3.375,loss:4.275326251983643,entropy:3.553321361541748,explained_var_old:0.324,explained_var_new:0.310 batch i:7, episode_len:14 kl:0.00829,lr_multiplier:5.062,loss:4.7379608154296875,entropy:3.5652589797973633,explained_var_old:-0.160,explained_var_new:-0.160 batch i:8, episode_len:16 kl:0.01200,lr_multiplier:5.062,loss:5.149411201477051,entropy:3.5494179725646973,explained_var_old:-0.554,explained_var_new:-0.554 batch i:9, episode_len:13 kl:0.01346,lr_multiplier:5.062,loss:4.923798561096191,entropy:3.576117753982544,explained_var_old:-0.338,explained_var_new:-0.338 batch i:10, episode_len:25 kl:0.00691,lr_multiplier:7.594,loss:4.791123867034912,entropy:3.573212146759033,explained_var_old:-0.217,explained_var_new:-0.217 batch i:11, episode_len:15 kl:0.00171,lr_multiplier:11.391,loss:4.606541633605957,entropy:3.5778207778930664,explained_var_old:-0.025,explained_var_new:-0.025 batch i:12, episode_len:12 kl:0.00178,lr_multiplier:11.391,loss:4.894505023956299,entropy:3.5709848403930664,explained_var_old:-0.310,explained_var_new:-0.310

junxiaosong commented 6 years ago

我也遇到过跑偏的情况,比较明显的表现是explained_var_old:0.000,explained_var_new:0.000,两个都为0了,一般重跑一下就好了;另外减小learn_rate似乎能防止这种情况

initial-h commented 5 years ago

我把模型保存下来,之后再加载进去继续训练,explained_var_old:0.000,explained_var_new:0.000一直出现,这是为什么呢。我看了一下,memory的数据收集没有问题,网络的prob输出没有问题,唯独value输出全都一样,都为0.这是为什么呢?

batch i:1, episode_len:16 batch i:2, episode_len:19 kl:0.00000,lr_multiplier:1.500,loss:4.583380222320557,entropy:3.5835189819335938,explained_var_old:-0.000,explained_var_new:0.000 batch i:3, episode_len:19 kl:0.00000,lr_multiplier:2.250,loss:4.583160877227783,entropy:3.5835165977478027,explained_var_old:0.000,explained_var_new:0.000 batch i:4, episode_len:18 kl:0.00000,lr_multiplier:3.375,loss:4.582594871520996,entropy:3.5835118293762207,explained_var_old:0.000,explained_var_new:0.000 batch i:5, episode_len:9 kl:0.00001,lr_multiplier:5.062,loss:4.581911563873291,entropy:3.5834879875183105,explained_var_old:0.000,explained_var_new:0.000 batch i:6, episode_len:16 kl:0.00009,lr_multiplier:7.594,loss:4.583174228668213,entropy:3.5833580493927,explained_var_old:-0.000,explained_var_new:-0.000 batch i:7, episode_len:15 kl:0.00214,lr_multiplier:11.391,loss:4.579413890838623,entropy:3.58164644241333,explained_var_old:0.000,explained_var_new:-0.000 batch i:8, episode_len:13 kl:0.00513,lr_multiplier:11.391,loss:4.573235988616943,entropy:3.5728251934051514,explained_var_old:-0.000,explained_var_new:0.000 batch i:9, episode_len:17 kl:0.00903,lr_multiplier:11.391,loss:4.583174705505371,entropy:3.582362174987793,explained_var_old:0.000,explained_var_new:0.000 batch i:10, episode_len:12 kl:0.00008,lr_multiplier:11.391,loss:4.582366943359375,entropy:3.583045482635498,explained_var_old:0.000,explained_var_new:-0.000 batch i:11, episode_len:8 kl:0.00798,lr_multiplier:11.391,loss:4.571702480316162,entropy:3.5798118114471436,explained_var_old:-0.000,explained_var_new:0.000 batch i:12, episode_len:12 kl:0.00246,lr_multiplier:11.391,loss:4.577787399291992,entropy:3.578699827194214,explained_var_old:0.000,explained_var_new:0.000 batch i:13, episode_len:18 kl:0.00150,lr_multiplier:11.391,loss:4.578939914703369,entropy:3.582991361618042,explained_var_old:0.000,explained_var_new:0.000