Open Kiv opened 6 years ago
I also not understand transpose(2,3) and the comment “use a swap placeholder to change {{1,2,3}, {4,5,6}} into {{1,2}, {3,4}, {5,6}”
There are two things to note:
Every node in d
layer is indexed by [action_id, parent_id, gp_id]
. When transitioning to d+1
layer, action_id
becomes the new parent_id
and [parent_id, gp_id]
are combined to generate the new gp_id
.
https://github.com/lifrordi/DeepStack-Leduc/blob/da416f9646725def43e668851593de13ead8b607/Source/Lookahead/lookahead_builder.lua#L235-L236
For new gp_id
, the code (gp_id - 1) * gp_nonallinbets_count + (parent_id)
means that parent_id
axis are higher than gp_id
axis. That is to say [parent_id, gp_id]
should be transposed to [gp_id, parent_id]
.
From d
layer to d+1
layer,
*[parent_id, gp_id]
--(transpose)--> [gp_id, parent_id]
--(change view)--> `[gp_idparent_id]** From
d+1layer to
dlayer, **
[gp_id]--(change view)-->
[gp_id, parent_id]--(transpose)-->
[parent_id, gp_id]`**
cfvs_data[d][{{}, {}, {}, 1, {}}]
stores the cfv data for “player 2” in the lookahead tree, and cfvs_data[d][{{}, {}, {}, 2, {}}]
stores the cfv data for “player 1”. That is to say the player indexs are swapped. Note that “player 1” is the first player in lookahead tree, and not really refer to player 1 in the real game.
I understand in normal CFR how a parent node computes the CFV, but I'm struggling to understand how this is done in the lookahead's vectorized implementation.
Specifically, in
Lookahead:_compute_cfvs()
:So I understand we multiply empty (illegal) actions by the mask so their CFV is zero, but I'm lost about what the swap and transpose is doing, or how slicing the parent's
cfvs_data
copies things to the right place.Can anyone explain more clearly?