indylab / nxdo

Deep RL Code for XDO: A Double Oracle Algorithm for Extensive-Form Games
https://arxiv.org/abs/2103.06426
MIT License
34 stars 8 forks source link

Porting to No Limit Texas Hold 'em #2

Closed TensorHusker closed 2 years ago

TensorHusker commented 2 years ago

Hello. I was toying with this code recently and have found its performance impressive. I was curious about how it might perform on No Limit Texas Hold 'em. Could this code tractably scale from Leduc and Kuhn to this much larger game? If so, how difficult would such a port be from the codebase as written?

JBLanier commented 2 years ago

Hey there! We haven't attempted it yet, but our guess is that NXDO should scale well to No-Limit Texas Hold'em compared to other model-free methods.

Because of the large stack size, there are a huge amount of relatively similar pure strategies. If we can get away with mixing among actions taken by a small subset of pure strategies throughout the game-tree, NXDO should be able to create an easier-to-solve abstraction of the game with its extensive-form restricted game.

The high-level steps to set up NXDO with a new environment would be the following. Making some new debugging/hyperparam search scripts would likely be necessary:

TensorHusker commented 2 years ago

Sounds great. I may fork it and start doing this. A couple of things I was wondering about:

JBLanier commented 2 years ago

Sorry for the late reply again. Addressing each point:

TensorHusker commented 2 years ago

Interesting. Thanks for the reply again. I still plan on experimenting with the repo some more, but I was wondering: could you tell me more about n-player games? I'm strong on my RL/AI side, but I'm still brushing up on my game theory. I remember Pluribus a couple of years ago, and I found this paper by tracing advancements/utilizations of DeepCFR. I was at first hoping that this paper could allow it to be improved, but now I wonder how that can be improved.

JBLanier commented 2 years ago

Hey! Been a bit. Getting back to you:

- What kind of equilibria exist for n-player games in contrast to zero-sum two-player games?

It's very much an open problem to find a universal solution concept for general-sum n-player games.

Nash Equilibria still exist for n-player games. However, unlike with 2-player zero-sum games, Nash Equilibria strategies in n-player games aren't interchangeable in terms of expected payoff. This means that if a group of players who independently calculated their own NE strategies were to match up, their joint strategy would not necessarily be an NE, and performance could be arbitrarily bad. If you want to learn more, this is called the equilibrium selection problem.

Alternate, more general concepts that you might want to compute for an n-player game include Correlated Equilibrium and Coarse Correlated Equilibrium, in which a randomized device is used to suggest which action profile players should adhere to each episode. The J-PSRO work we mentioned solves for this, and it's a useful solution that isn't universally applicable for every use case.

- What kinds of methods are used in n-player imperfect information games? In particular, the types of games that relate to multiplayer Texas Hold 'em.

So aside from the heuristically well-performing CFR method that Pluribus used (all 6-players use counterfactual regret minimization) and (C)CE, there aren't very many general methods that have useful guarantees for larger games. Self-play has no guarantees but can heuristically do well depending on the game.

- Do you have any more literature on this area I could peruse?

There's not a huge amount of literature that I'm aware of as most recent progress has been limited to 2-player zero-sum games. Another Deepmind work used a Fictitious Play-based approach to solve for an approximate course correlated equilibrium in 7-player No-Press Diplomacy (Anthony & Eccles et al 2022) https://arxiv.org/pdf/2006.04635.pdf.

Two great game theory reference books that I'd recommend are: Multiagent systems, Shoham & Leyton-Brown Game Theory, Fudenberg and Tirole

- Are there potential avenues for this work to be used to improve upon n-player imperfect information game learning? EDIT: I realize now that you discussed this in your reply. I suppose a better way of phrasing it is what does this work, overall, aim to create? My own read-through suggested to me that this is an overall framework that can be used to find certain equilibria more efficiently on larger games (particularly with larger action spaces). What it optimizes for is dictated by the solvers it is used with. Is this an overall correct way of describing the work at a bird's eye view? If so, I find this a helpful definition to help me delineate it from, say, the subgame solvers themselves, which helps refine my search and prototyping.

The main idea of our work is that PSRO (generally used for 2-player games) has a worst-case time complexity that's exponential in the number of game infostates. We fix this by using an extensive-form metasolver instead of a normal-form metasolver. Our worst-case time complexity is linear in the number of game states. This is applicable to large games since we should scale better in the worst-case. Technically you could use metasolvers that solve for something other than Nash Equilibrium, which would be future work.

- Combining JPSRO and their meta-game solver with NXDO could be interesting.

Agreed, although you'd need an extensive-form (C)CE metasolver for XDO/NXDO. Might be a neat future direction!

JBLanier commented 2 years ago

I'm going to go ahead and close this issue assuming that's ok.