Closed jneckar closed 4 years ago
The two hole cards the player is dealt
This sounds totally doable. In order to differentiate the hole and the community cards we'd need to add 52 entries making 104 total entries devoted to cards. The first 52 representing the deck in the hole and the second 52 for the community cards. For a short deck we'd need only 72 entries instead.
The cards on the board, if any, with a distinction between flop, turn, and river
I believe this distinction can be inferred by the number of community cards available.
The history of bets made so far, broken down by round (other state information, such as the size of the pot, which player's turn it currently is, etc, can be inferred from this)
The fact that this is a fixed sized binary input vector really limits the capability of recording this state. Is such a vector a requirement for the CFR algorithm? The vector's size would vary based on the number of players, we would have to have an upper limit of players. Recording the amount that a player bets in binary would force a limit on how big the bets can be.
In the code, this vector is used for other algorithms (like DQN) so if we were to break one of these constraints it's entirely possible to make a new vector for CFR specifically.
There shouldn't be a specific requirement for the state representation using CFR. Essentially, CFR just uses the state as a lookup key for regret and policy tables. But while it shouldn't have that requirement, I'll need to dig back into the implementation to make sure that's actually the case.
While there shouldn't be any hard restriction on limiting the state representation to 72 bits, making it as efficient as possible is still something that's important. We are likely to be running into memory issues eventually since we need to be able to store 3 values with every reachable infoset in the game. But don't worry too much about that for now, first the focus should be on consolidating strategically equivalent states into the same representation. We can worry about the representation efficiency after.
I believe this distinction can be inferred by the number of community cards available.
It's important to be able to distinguish which card was dealt on which round. So yes you could infer what round it is by the number of board cards, but you couldn't, for example, look at 4 cards and determine which of them was dealt on the turn.
BTW, I did some back-of-the-envelope math today and figured out the total number of infosets in the game, using full hole/flop card order and suit isomorphism but no further abstraction. The number I got was 175 billion (probably not exactly right, but within an order of magnitude). That's a really big number obviously, and one we will likely need to eventually shrink further, but without accounting for order/suit isomorphism the infoset space would increase to 276 trillion. So just figuring out how to consolidate strategically identical states into the same representation will save 3 orders of magnitude in the memory requirements and training time for running CFR on the game. Very important stuff!
So would the state not need to be represented as a binary vector? Could we then utilize a dictionary or class that holds the data we need instead? Say an object that contained the following attributes: hole_cards, flop_cards, turn_card, river_card, etc. It could also have a method that determined if another state was identical to itself.
This seems to be possible, but I'm not sure. The resources I've been searching through (page 12 of an introduction to CFR and the code in this medium article) don't seem to mention this state representation that we're talking about. At least, not that I'm able to discern out of the confusing algorithms.
Out of curiosity, what's the strategic value in knowing which card was revealed on which stage of the game?
It would be fine to represent it in a dict as an intermediate step but I think we would want to output a string representation to the algorithm
Sorry for rushed response, here are more complete answers: So, using the word 'state' can get kind of confusing since it can refer to 2 different things in this context: the complete game-state (all cards/actions) or the information set (only that information which is visible to the player). Each information set contains many different game-states (one for each possible combination of opponent's hidden cards) but is treated as a single state by the algorithm.
The 'state' of interest, when discussing the representation that you are working on, is actually the infoset. I will try to just always refer to that instead of state in the future to hopefully minimize confusion. In the rlcard CFR implementation, the infoset is returned from the _extract_state() function. In the implementation in that medium article, it is returned from get_information_set(). The exact representation of the infoset isn't really important, but we do need to be able to have some sort of string representation of it for lookups.
Out of curiosity, what's the strategic value in knowing which card was revealed on which stage of the game?
Say a player puts a lot of raises in on the flop. That means they are more likely to have a hand which is coordinated with the flop cards. But you can't really directly infer anything about whether the turn card helped or hurt their hand based on those bets. But if they instead start betting and raising on the turn, it's more likely the turn card was relevant to their hand.
Here's a concrete example to hopefully clarify: Scenario 1: Player has KK, opponent makes 3rd raise on 872 flop. Turn ace and opponent bets Scenario 2: Player has KK, opponent makes 3rd raise on A72 flop. Turn 8 and opponent bets
In the 2nd scenario it is MUCH more likely that the player's KK is losing
I've poked through a lot of the code and took notes, and now I have a much better understanding of "obs" and how it's used throughout RLCard. I'll write down what I found, and propose how we can solve this issue. Sorry if this post is a bit messy!
As you said, "obs" in CFRAgent is used as an infoset, and quite a limited one at that. You're right in that it does not need to be a binary vector at all and in fact any string can be used to represent the infoset. To be more general, any object that is hashable and has equality can be used. This infoset is just a string used in CFRAgent's policy dictionary, where the probabilities for an action given an infoset are stored.
We can't just change "obs" though, as it's used in DQNAgent and perhaps a few other algorithms that require it be a binary vector. If we could achieve the infoset with more strategic info in the form of a fixed size binary vector, we could make all the algorithms that use "obs" better, but given what I said here I believe this isn't possible.
But we can make CFRAgent better by switching the string representation to something else. I propose we make a new object "Infoset" inside cfr_agent.py that contains the following attributes:
It also contains the following methods:
How this is actually encoded into a string, I'm not sure yet. Here's what I've brainstormed:
4,4,4,8,-1,8,8,-1,8,8,-1,8
What we gather here is that there are 3 players, the first 3 rounds they bet a total of 4, the second round player 2 folds while players 1 and 3 keep betting 8 chips for the remainder of the rounds. Please correct me if this isn't a legal round of poker!
After that, we would need to update the areas in CFRAgent that utilize "obs" and change it to utilize this object instead.
Quick clarification: we are going to be operating with the game fixed at only two players, so there is no need to worry about tracking the number of players in the game in the state representation.
Good stuff, I need to look at the code base a little more closely but a few thoughts:
If some of the other agents require reading obs as a binary vector, maybe it is best to just plan to just keep some version of that encoding as the output from extract_state. Maintaining consistency in the way agents receive signal from the environment is probably better design and less likely to complicate things down the road. I don't think that should actually really restrict us at all in the representation... It should be relatively straightforward to convert whatever representation we come up with to and from a binary vector. I mean, as it actually exists in memory, any data structure is already essentially a binary vector.
For an example representation, there are 81 possible distinct starting hands (9 pairs + 36 unpaired suited + 36 unpaired unsuited) so we could just map those 81 hands to the first 7 bits of the vector. The board cards and betting sequences could each have their own mapping to a specific slice of bits. Doing this intelligently would definitely be a lot more efficient than the existing representation, by my estimate the entire lossless state could be stored in something like 35 bits.
Another clarfication, since it's a fixed limit betting structure, there is no need to track the size/amount of the bets. And with only 2 players, there isn't actually a huge number of potential betting sequences. I'll enumerate all of them here, with c meaning 'check' or 'call' and b meaning 'bet' or 'raise', and the '-' post-fix implying that the betting round is not resolved:
- c- cc cb- cbc cbb- cbbc cbbb- cbbbc cbbbb- cbbbbc b- bc bb- bbc bbb- bbbc bbbb- bbbbc
So 19 sequences could be mapped to 5 bits.
If it's easier to conceptualize, come up with a string representation first and we can worry about the bit mapping afterwards. I think something like 'AhKc,bbc,JcTs7s,cbc,8s......' is pretty efficient and intuitive for representing a game where the player is dealt Ah Kc, first betting round goes raise, raise, call, flop is Jc Ts 7s, flop betting is check, bet, call, etc. Do you think that makes sense? There is still some complexity to work out here with suit isomorphism...
Here's a crack at an encoding that is reasonably space efficient, outputs the same encoding for most strategically identical infosets (rank isomorphism), and doesn't use mappings that degrade performance when it's used as NN input:
bit0-8: ranks of hole cards 6-A. If only only 1 bit is active, it implies hole cards are a pair. ex: 000101000 = J9, 010000000 = 77 bit9: active if both hole cards are the same suit, otherwise 0 bit10-18: ranks of flop cards. If 2 bits are active, we can infer there is a pair on the board. If only 1 bit is active, trips are on the board. bit19: only used if the flop has a pair. 0 means the pair is the lower rank of the 2 active bits, 1 means it is higher. bit20-28: rank of turn card bit29-37: rank of river card If the flop/turn river haven't yet been dealt, bits are 0'd out bit38-40: number of bets in the preflop round 0-4 bit41: last player to make a bet or raise in the preflop round, 0 for player, 1 for opponent (number of bets + last player to bet is a good enough representation of the round by round betting sequence and simplifies things, should be better for a NN to interpret as well) bit42+: repeat pattern of bits38-41 for flop, turn, river round.
I think that expresses all the state information we need except for suit information, but I need to think about it some more to come up with an elegant way of expressing that with isomorphism...
Here's the finalized encoding scheme for this issue: r = Number of ranks. r = 13 for a normal holdem game, r = 9 for a short holdem game. Index | Meaning |
---|---|
0 | Hole cards are same suit. |
1 | (Bits 1-8 concern them self with community cards) If only 2 cards are same suit EX 'SSDCH' |
2 | If those 2 card's suit match one in your hand |
3 | If only 2 cards are same suit of a different suit than before mentioned EX 'SSDDC'. |
4 | If those 2 card's suit match one in your hand and bit 2 is 1. |
5 | If only 3 cards are same suit EX 'SSSDC' |
6 | If those 3 card's suit match one in your hand |
7 | If only 4 card's suit are same suit |
8 | If those four cards match one in your hand |
9-11 | Binary representation of integer amount of bets in pre-flop |
12 | Last player to make a bet. 0 is self or no bets were made, 1 is opponent. |
13-24 | Repeat bits 9-12 for flop, turn, and river. |
25 ~ 25 + (r-1) | Ranks of hole cards. If only 1 bit is active, that implies hole cards are a pair. All 0s if not dealt. |
25 + (r-1) ~25 + 2(r-1) | Ranks of flop cards. If 2 bits are active, there is a pair on the board. If 1 bit, there are trips. |
25 + 2r | 0 means the pair is the lower rank of the 2 active bits, 1 means it's the higher. Meaningless if there are no pairs. |
~25 + 3r | Rank of turn card. All 0s if not dealt. |
~25 + 4r | Rank of river card. All 0s if not dealt. |
If the network doesn't like bits 9-10, then I'll make those into four bits as the previous encoding scheme did.
The encoding used in the limit holdem model has some major problems. Here is what is currently implemented:
"The state is encoded as a vector of length 72. The first 52 elements represent cards, where each element corresponds to one card. The hand is represented as the two hole cards plus the observed community cards so far. The last 20 elements are the betting history. The correspondence between the index and the card is as below.
The biggest problem is that this representation doesn't differentiate between cards in the player's hand and cards on the board. The agent loses a huge amount of important information by not making the distinction, and it's something we definitely need to include with our agent (it treats having AA on a 732 flop exactly the same as having 72 on a AA3 flop, while strategically these situations could not be much more distinct). There is also a loss of important information about the betting history: it records the number of bets in each round but not the order they were made. From a strategy perspective this is a very important info, though not quite as big a deal as the card problem.
A full representation of state can be specified with the following features: -The two hole cards the player is dealt -The cards on the board, if any, with a distinction between flop, turn, and river -The history of bets made so far, broken down by round (other state information, such as the size of the pot, which player's turn it currently is, etc, can be inferred from this)
The state encoding doesn't necessarily need to contain ALL of that information, but it should hold as much as is efficiently possible. Figuring out a good encoding is probably going to take some creativity and experimentation. Here are the critical features:
-There should be no distinction between states that contain the exact same public information, and player's private information. That is to say, the cards in the opponents hand must NOT be present in the encoding.
-All states that are strategically identical SHOULD be encoded the same, to the extent it is possible. For example, the agent being dealt A, 5 hole cards should be encoded the same as the agent being dealt 5, A (though there should maintain a distinction between being dealt two cards of the same suit vs unsuited). Likewise the order that the 3 cards on the flop are dealt does not matter. One way of handing this would be to sort the hole cards and flop cards in high-to-low order before encoding. Suit isomorphism will be a little trickier to handle. One way of implementing it would be to reassign the suits based on rules that maintain the important info, for example you could encode the players highest rank hole card as 'suit 1', and 2nd hole card (if different suit) as 'suit 2'. The first card to be revealed on the board with a suit not present in the agent hand would be 'suit 3' (or 'suit 2' if the player cards are the same suit), etc.
-States that are strategically similar MAY potentially be grouped into buckets. The smaller the state-space, or total number of possible states, the more efficiently the agent will learn how to play (bucketing: good). But if there are states with important strategical differences that are grouped into buckets, the agent's strategy might not be very effective when translating it back into the the full game (bucketing: bad). Not all info is equally important to maintain in a lossless manner. There are two aspects of the state where bucketing can be considered: Hand strength and betting pattern.
-Hand strength should be bucketed in a way that preserves most of the complete information about relative hand strength. The way it is currently implemented, by grouping private and public cards into a hand, is wayyyyyy too lossy. An example of bucketing that is much less lossy would be grouping hands like (hole cards) Qh Th and 9h 8h on a Ah Jh 6h flop into the same bucket - there are only a few hands that beat one but not the other. In general, the stronger a hand is, the less important the distinction between similar hands: There is very little difference in how likely 4 of a kind 6's is to lose when compared to 4 of a kind aces, but there is a huge difference in strength between a pair of 6s and a pair of aces. Another way of thinking about this is that the buckets should be differentiated in such a way that each bucket for a given round will be reached with approximately the same probability. There is some important nuance here that we should try to avoid overlooking: the board cards can have a big effect on the likelihood of hand rankings, eg it is much easier for a player to have a flush on a board with 4 suited cards than one with 3. Overall, efficient hand strength bucketing is a really hard problem so I think it's okay if our initial version is a bit crude.
-The betting pattern can potentially be abstracted, but the current implementation of only tracking the number of bets in each round is probably too lossy. The most important info lost with that encoding is which player has the initiative: p1 check/p2 bet/p1 call is very distinct from p1 bet/p2 call. This distinction probably gets a bit less important when there are 3 or 4 bets in a round, but I think the number of possible betting sequences is probably small enough in a limit game that we can just avoid betting abstraction completely and encode the full bet history. Potentially we can experiment with further abstraction here in the future.