yichen914 / MyAlphaGoZeroOnConnect4

My Simple Implementation of AlphaGo Zero on Connect4
18 stars 4 forks source link

Question about the network used #3

Open lijas opened 5 years ago

lijas commented 5 years ago

Hello

Can you tell me what about the model your using. How many conv-layers, residual layers etc.

yichen914 commented 5 years ago

If I remember correctly, the network has 1 input layer and followed by 5 mid layers. Each of these mid layer uses both convolution and residual technique. Then, the network is splitted into 2 branches - value head and policy head.

lijas commented 5 years ago

Ok thank you. And a question about the tempeture-parameter. For how many moves do you use tau=1 before changing to tau=0? Is the tempeture only applied during self play, or also when playing real games?

yichen914 commented 5 years ago

Re: how many moves before changing tau from 1 to 0, I think it is based on your experience. In my code, I set it to 10 steps. The temperature is for balancing the exploration and exploitation. When we train network (self play) we need the network to do enough exploration before it dives into a certain path. But when we compare or test the network, we assume that the path the network takes is the "best" (so far), so we don't need to set the temperature to 1.