junxiaosong / AlphaZero_Gomoku

An implementation of the AlphaZero algorithm for Gomoku (also called Gobang or Five in a Row)
MIT License
3.23k stars 962 forks source link

Why is the negative leaf_value for update_recursive function? #25

Open KelleyYin opened 6 years ago

KelleyYin commented 6 years ago

https://github.com/junxiaosong/AlphaZero_Gomoku/blob/68603c0d8e5a0ef9273bacc7d281abe27493da1b/mcts_alphaZero.py#L137

I can't absolutely understant the negative leafvalue, which is different with in the paper(AlphaGo Zero) <a href="https://www.codecogs.com/eqnedit.php?latex=\inline&space;Q(s,&space;a)&space;=&space;1/N(s,&space;a)&space;\sum{s'|s,a\rightarrow&space;s'}V(s')" target="blank"><img src="https://latex.codecogs.com/gif.latex?\inline&space;Q(s,&space;a)&space;=&space;1/N(s,&space;a)&space;\sum{s'|s,a\rightarrow&space;s'}V(s')" title="Q(s, a) = 1/N(s, a) \sum_{s'|s,a\rightarrow s'}V(s')" /> Could you give a explaination for this? Thank you very much .

xiaoyangzai commented 6 years ago

I think the right answer is node.update_recursive(leaf_value)?!!

GeneZC commented 6 years ago

'cause parent node and current node belong to different player, and value assigned to each node is according to the player

junxiaosong commented 6 years ago

We use the negative value of the state, this is because alternate levels in the search tree are from the perspective of different players and the Q-values are in fact used by the parent node in select stage.

gmftbyGMFTBY commented 6 years ago

但是在调用的时候传入的应该是 leaf_value 而不是 -leaf_value 啊,update_recursive 函数中的负号的含义很明确,但是这里感觉需要传入的是 leaf_value ??希望可以解释一下,这里看的不是很懂 @junxiaosong

junxiaosong commented 5 years ago

@gmftbyGMFTBY leaf_value是从leaf节点的视角考虑的,leaf value传入后是用来更新Q value的,而leaf节点的Q value是给它的父节点选择分支的时候用的,所以这个Q value是从父节点的视角出发的,所以leaf节点自身的leaf value和自身的Q value就是从相反的视角考虑的,所以传入时就加了负号。

lvsh2012 commented 1 year ago

@gmftbyGMFTBY leaf_value是从leaf节点的视角考虑的,leaf value传入后是用来更新Q value的,而leaf节点的Q value是给它的父节点选择分支的时候用的,所以这个Q value是从父节点的视角出发的,所以leaf节点自身的leaf value和自身的Q value就是从相反的视角考虑的,所以传入时就加了负号。

为什么父节点视角 和 leaf 自身视角,需要一正一反?

nicehzj commented 1 year ago

@gmftbyGMFTBY leaf_value是从leaf节点的视角考虑的,leaf value传入后是用来更新Q value的,而leaf节点的Q value是给它的父节点选择分支的时候用的,所以这个Q value是从父节点的视角出发的,所以leaf节点自身的leaf value和自身的Q value就是从相反的视角考虑的,所以传入时就加了负号。

为什么父节点视角 和 leaf 自身视角,需要一正一反?

因为父节点和子节点是对立的两个玩家,其中任意一个的动作的受益都是对另一个的损害,所以子节点状态的好在父节点看来就是不好,就是零和博弈