train的时候这个data_buffer满了以后以前的数据会自动出去让新的进来吗？

junxiaosong / AlphaZero_Gomoku

An implementation of the AlphaZero algorithm for Gomoku (also called Gobang or Five in a Row)

MIT License

3.27k stars 964 forks source link

train的时候这个data_buffer满了以后以前的数据会自动出去让新的进来吗？ #35

Closed huyp182 closed 6 years ago

huyp182 commented 6 years ago

还有，不太懂为什么策略网络这采用4个卷积核？ self.action_conv = tf.layers.conv2d(inputs=self.conv12, filters=4, kernel_size=[1, 1], padding="same", activation=tf.nn.relu) 价值网络这采用2个卷积核？ self.evaluation_conv = tf.layers.conv2d(inputs=self.conv12, filters=2, kernel_size=[1, 1], padding="same", activation=tf.nn.relu)

junxiaosong commented 6 years ago

data_buffer是一个deque，设定了maxlen，满了之后新进来的就会把最老的挤出去
这边用几个卷积核完全是我选择的模型结构而已，没有啥确定的说法，你完全可以改变网络模型的结构的

huyp182 commented 6 years ago

奥，谢谢！我想实现一下生成数据并行化，还有生成数据和训练并行化，如果实现了可以贴上来，大佬有什么建议吗

junxiaosong commented 6 years ago

在issue #13 里有同学提到 “用了一个进程负责 self-play 和 training的部分, 另外4个进程只负责self-play的部分”，供参考；另外要实现MCTS并行加速的话可能需要用到virtual loss的trick，论文里有描述

huyp182 commented 6 years ago

嗯，谢谢！