lucas98774 / rl-blackjack

Implementing rl on blackjack
0 stars 0 forks source link

Ideas for easier future development #12

Open Ndiedrick21 opened 2 years ago

Ndiedrick21 commented 2 years ago

Currently when an action is requested of a player, the only information they have is their own hand, and the value of the dealer's visible card. In a full game of blackjack. More information can be known with more players. I understand that this was done initially to start with finding a simple policy, but if this is to be extended, it would be nice for the player to have as much information as possible to make a decision. If this were too be expanded, a simple policy could still be developed, but it would allow for more complex RL methods to be done in the future.

These changes would mainly including giving the players all of the visible cards in play, as well as the specific card that the dealer has. I think a mechanic could be added either to the card/hand class or the game class to control which cards are visible to which players.

I think it would be worth putting time into developing this out further before even looking at simple RL methods. This would help create a more complete blackjack game before exploring RL, rather than having to make these changes later.

Share your thoughts in the comments.

Ndiedrick21 commented 2 years ago

If this is agreed upon, more planning can be done

lucas98774 commented 2 years ago

Nathan I think your concern is twofold:

  1. In the short term we can improve the print out so a player (not an rl agent) can make a more informed decision if they would like to.
  2. In the longer term, allowing an agent (rl agent) to take other players hands into account should be an extension to this project. Specifically this should be an extension which forces to use more complex rl methods since this addition to the game (depending on the number of players in the game) will drastically increase the number of possible states. So I think this extension will be interesting but will require more on the rl side which makes not suitable to tackle yet.

Anyone else have any other ideas or comments? Definitely looking for input on this --- this is just my initial thoughts.

Ndiedrick21 commented 2 years ago

I think you could make the extensions on the game and policy input side now, and just do some feature engineering to derive the simple state values to do the simple rl stuff to begin with. This would make the agents more backwards compatible when we do want to use more advance rl methods. Both will still work. Does that make sense? This overall may include another function that the games calls instead of policy, so that policy can be specific to what features you want to use

lucas98774 commented 2 years ago

Nathan that is a good point, here is my initial thought: an agent's (agent means rl --- player is not rl but not the dealer) policy will intake the dealer's value and also args which will represent the value of any other player in the game. This will allow for the back-compatibility as you suggested and as far as actually feeding that into rl we can tackle that later as we progress on the rl side, we just have to make sure setting up our rl algorithm can be flexible to hand args. What do you think or this? Did you have something else or something more in mind?

lucas98774 commented 2 years ago

To address the other stuff directly I do not think feature engineering is required (we can definitely do this but I think this starts to get into deep rl). Again though if others are interested we can definitely start going this direction. I am not tracking you on adding another method to the game class though. What is the purpose of this exactly or what situation makes you think this will be necessary/ why should we do this?

Ndiedrick21 commented 2 years ago

I think the idea for another function would be something like play_turn and the inputs would be all available visible information from the game, including all of the cards that each player and the dealer has. From there you could call policy within the class that uses dealer value, or whatever you want your policy to be based on.

play_turn would be called by the game instead of policy. Since eventually all of this information may be used by the agent, i don't think it would hurt to model the game mechanics around that from the beginning. Unless you don't think it would be worth the time, based on what you want to get out of this project.

lucas98774 commented 2 years ago

I am not sure exactly why the game would need to be structured as you are suggesting ... let's get on a call to discuss this. Do you have any preference on time?

lucas98774 commented 2 years ago

@Ndiedrick21 I added functionality to take in the other players hand in play_round, single_player_hand (I spelled this function wrong), as well as in the policy of a Player. As we discussed the implementation is not the hard part.

@hall4jm @ruetten @Ndiedrick21 guys the difficult parts of this adjustment are:

  1. How do we encode the other players' cards into the state (currently the dealers value of the face up card and the total of the current player's hand) so the current player can make a more informed decision?
  2. This is related to 1., what type should we pass? Right now I am passing the other players --- but it may be more appropriate to pass their hands instead ...

Let's take some time to think about this since I think this is an important part of the game mechanics but if you guys have any ideas definitely let the group know.

ruetten commented 2 years ago

Hey I'll take a look at this sometime this weekend, sorry I'm a little late to the party hahaha