lifrordi / DeepStack-Leduc

Example implementation of the DeepStack algorithm for no-limit Leduc poker
https://www.deepstack.ai/
891 stars 211 forks source link

Understanding Lookahead:_compute_cfvs() #23

Open Kiv opened 6 years ago

Kiv commented 6 years ago

I understand in normal CFR how a parent node computes the CFV, but I'm struggling to understand how this is done in the lookahead's vectorized implementation.

Specifically, in Lookahead:_compute_cfvs():

function Lookahead:_compute_cfvs()
  for d=self.depth,2,-1 do
    local gp_layer_terminal_actions_count = self.terminal_actions_count[d-2]
    local ggp_layer_nonallin_bets_count = self.nonallinbets_count[d-3]

    self.cfvs_data[d][{{}, {}, {}, {1}, {}}]:cmul(self.empty_action_mask[d])
    self.cfvs_data[d][{{}, {}, {}, {2}, {}}]:cmul(self.empty_action_mask[d])

    self.placeholder_data[d]:copy(self.cfvs_data[d])

    --player indexing is swapped for cfvs
    self.placeholder_data[d][{{}, {}, {}, self.acting_player[d], {}}]:cmul(self.current_strategy_data[d])

    torch.sum(self.regrets_sum[d], self.placeholder_data[d], 1)

    --use a swap placeholder to change {{1,2,3}, {4,5,6}} into {{1,2}, {3,4}, {5,6}}
    local swap = self.swap_data[d-1]
    swap:copy(self.regrets_sum[d])
    self.cfvs_data[d-1][{{gp_layer_terminal_actions_count+1, -1}, {1, ggp_layer_nonallin_bets_count}, {}, {}, {}}]:copy(swap:transpose(2,3))
  end

end

So I understand we multiply empty (illegal) actions by the mask so their CFV is zero, but I'm lost about what the swap and transpose is doing, or how slicing the parent's cfvs_data copies things to the right place.

Can anyone explain more clearly?

DWingHKL commented 6 years ago

I also not understand transpose(2,3) and the comment “use a swap placeholder to change {{1,2,3}, {4,5,6}} into {{1,2}, {3,4}, {5,6}”

yffbit commented 3 years ago

There are two things to note:

  1. Every node in d layer is indexed by [action_id, parent_id, gp_id]. When transitioning to d+1 layer, action_id becomes the new parent_id and [parent_id, gp_id] are combined to generate the new gp_id. https://github.com/lifrordi/DeepStack-Leduc/blob/da416f9646725def43e668851593de13ead8b607/Source/Lookahead/lookahead_builder.lua#L235-L236 For new gp_id, the code (gp_id - 1) * gp_nonallinbets_count + (parent_id) means that parent_id axis are higher than gp_id axis. That is to say [parent_id, gp_id] should be transposed to [gp_id, parent_id]. From d layer to d+1 layer, *[parent_id, gp_id] --(transpose)--> [gp_id, parent_id] --(change view)--> `[gp_idparent_id]** Fromd+1layer todlayer, **[gp_id]--(change view)-->[gp_id, parent_id]--(transpose)-->[parent_id, gp_id]`**

  2. cfvs_data[d][{{}, {}, {}, 1, {}}] stores the cfv data for “player 2” in the lookahead tree, and cfvs_data[d][{{}, {}, {}, 2, {}}] stores the cfv data for “player 1”. That is to say the player indexs are swapped. Note that “player 1” is the first player in lookahead tree, and not really refer to player 1 in the real game.