hanabi / hanabi.github.io

A list of Hanabi strategies
https://hanabi.github.io/
Creative Commons Attribution Share Alike 4.0 International
162 stars 152 forks source link

modify the min eff formula #104

Closed jakestiles closed 4 years ago

jakestiles commented 4 years ago

Can we put this in the doc?

Zamiell commented 4 years ago

there are hundreds of different variants that have a black suit in a 6-player game. what exactly are you proposing?

jakestiles commented 4 years ago

Libster stated 6 player black is a hard variant. It sounded known- like something that just needed documenting, not like something that needed to be proposed. If an argument needs to be made for it, I don't have one.

Zamiell commented 4 years ago

6 player black is a hard variant

that is a nonsensical statement. "black" is not a variant, it is a suit

jakestiles commented 4 years ago

Black (6 suits) is the one it was brought up during.

Zamiell commented 4 years ago

the minimum required efficiency in 6-player for black (6 suits) 1.20, so it is not a hard variant, because 1.20 is less than 1.25

alercah commented 4 years ago

Is this a proposal to define it as one of the exceptional hard variants?

aliblong commented 4 years ago

The required efficiency for a given deal (ignoring draw modulation), η_req, depends on the deal. Having cards out of order in a particular way will increase η_req. That figure of 1.20 actually represents the minimum required efficiency for the game, η_min, i.e. η_req given an ideal card configuration. The distribution of η_req across all possible deals is better captured by its mean, ⟨η_req⟩, which is necessarily higher than η_min. The difference between these terms, ⟨η_req⟩ - η_min, will be greater for 6-player than for 3-player, due to the increased clue cost of cards being out of order. What size is this difference in differences? I would guess ⟨η_req⟩ for 6-player black 6-suit (η_min = 1.20) is higher than ⟨η_req⟩ for 4-player black 5-suit (η_min = 1.25).

Computing ⟨η_req⟩ requires computing η_req for every possible gamestate; I'm not sure doing so is impossible given symmetries that one can exploit in the computation, but it's certainly much harder than computing η_min. Therefore, η_min seems like a reasonable first-order approximation of game difficulty, despite the obvious flaw of representing only the easiest possible deals. The value of 1.25 is nothing more than a convenient choice for drawing the line between easy and hard games, since it is a common value for η_min in the space of (p, v).

With all that said, if we want to maintain a list of "exceptionally hard games" (do we already have one? I haven't seen it), my opinion is that 6-player black 6-suit should obviously be on it. And if you want to include the above as a written motivation for its inclusion therein, that's fine. But I'm also fine with just declaring in text chat at the start of the game that we're playing with "hard variant" rules, which is one reason I explicitly declined to open an issue on the matter.

Zamiell commented 4 years ago

I'm also fine with just declaring in text chat at the start of the game that we're playing with "hard variant" rules

this seems like a good solution to me

alercah commented 4 years ago

I think your analysis is wrong, aliblong.

The efficiency requirement is not deal-dependent. It's a simple mathematical calculation derived from the facts that any winning game in a given variant must play all the cards before the game ends. I also don't think it doesn't make much sense to talk about efficiency required for a given deal. You can calculate the efficiency of a given won game from only the number of bombs, number of clues left at the end, and number of cards or turns left. If we assume that there won't be bombs, then the players can always ensure that the deck is empty and they have no clues left. The only thing that the players can't directly and easily control is who has the last 5 to play.

I think you're trying to measure something completely different: how difficult it is to get those highly efficient clues. This is not only deal-dependent but convention- and player-dependent. I don't think we have a good way of quantifying this, though there's definitely some intuition available, like that one-of suits are worse because they require more saving, fewer clue options is worse because you convey less information with each clue, and that more ambiguous clues are worse for the same reason. It might be more reasonable to measure it empirically by looking of efficiency of won games in the variant.

All this said, I am a fan of just agreeing that all decisions on whether to follow easy or hard conventions be explicit, as I argued in #99. We currently have a list of special hard variants that are hard despite the efficiency being less than 1.25, namely special mix 5 or 6 suits, and dual-color mix 6 suits. Why do we even have that list at all if we're not going to update it? We could just add "any variant with 6 players and a one-of-each suit" to the list of hard variants, particularly if everyone is going to agree to play it as hard consistently.

aliblong commented 4 years ago

I think you're trying to measure something completely different: how difficult it is to get those highly efficient clues.

There's a subtle difference between this phenomenon and the one I'm talking about. The former concerns the efficiency of productive clues (clues whose purpose is to convey information about important cards) as a function highly dependent on variant, v, and somewhat dependent on the number of players, p; the latter concerns the mean number of required unproductive clues (clues whose only purpose is to delay the final round of the game) as a function highly dependent on p and somewhat dependent on v: specifically, on the rank distributions of the suits being played (i.e. normal vs. one-of-each vs. up-or-down).

One can create a model of efficiency that treats productive clues and unproductive clues on even footing (i.e. only look at η_min), but it isn't as instructive to evaluation of game difficulty as one that effectively subtracts unproductive clues from the total pool. That was the thrust of my argument: the former model obscures that in practice, some 1.20 variants ("variant" meaning (p,v) combination) are actually harder than some 1.25 variants, because the mean number of clues fixed to 0 efficiency depends on (p, v). Losing even one clue to a stall in 6 player black 6-suit effectively raises η_req from 1.20 to 1.25; one more would raise it again to 1.30. It's not hard to imagine that this variant would require on average one more stall than 4-player black 5-suit.

Maybe the simplest way of putting it is this: "the more players in the game, the more stalls must be given on average; this isn't captured in the estimation of difficulty by η_min".

All this said, I am a fan of just agreeing that all decisions on whether to follow easy or hard conventions be explicit, as I argued in #99. We currently have a list of special hard variants that are hard despite the efficiency being less than 1.25, namely special mix 5 or 6 suits, and dual-color mix 6 suits. Why do we even have that list at all if we're not going to update it? We could just add "any variant with 6 players and a one-of-each suit" to the list of hard variants, particularly if everyone is going to agree to play it as hard consistently.

Like I said, I'm fine with this.

Zamiell commented 4 years ago

We could just add "any variant with 6 players and a one-of-each suit" to the list of hard variants, particularly if everyone is going to agree to play it as hard consistently.

rather than add "any variant with 6 players and a one-of-each suit" as alercah suggests, it probably makes more sense to: 1) lower the hard variant threshold for 6 player games specifically down from 1.25 to 1.2?? (which would affect all variants), or 2) add a new positive constant to the minEff calculation that has a coefficient based on the number of players, such that 6player has a bigger penalty than 5player, and so forth

Zamiell commented 4 years ago

as an aside, i think it is more valuable to display η_req to the player rather than η_min do you want to write a program that would approximate it based on like trials from a million deals or something?

aliblong commented 4 years ago

I'll start thinking of possible analytical approaches. Otherwise, hand labelling endgame stall clues in a sample of games is an option, but a tedious one.

alercah commented 4 years ago

Ahh, I see what you're getting at, specifically the possibility of the players being required to burn clues at the end in order to avoid the game ending early? In that case, I can see the benefit of this approach, and it seems like a fun toy problem. I'm not quite clear if @Zamiell means displaying the average η_req for the variant, or the η_req for the specific deal though.

But that brings me to another observation, which is that the average η_req isn't well-defined for any variant with one-of cards, because of the existence of impossible deals. A deal with a one-of non-5 on the bottom is unwinnable, if not playing with last-card deck plays, so η_req is undefined. We could just omit those games, of course, but if we're going to start seriously thinking about that sort of thing, then it definitely makes you consider whether we should perhaps look into other changes, including rigging the shuffler to prevent them, or more drastic rules changes that can mitigate against endgame stalling more aggressively.

Zamiell commented 4 years ago

we've decided in the past to not rig the deck / the shuffler

aliblong commented 4 years ago

I'm not quite clear if @Zamiell means displaying the average η_req for the variant, or the η_req for the specific deal though.

Showing η_req for the deal would either be leaking information about the deck order, or it would come too late to be of any use to the team, so I think it has to be average η_req for the variant.

the average η_req isn't well-defined for any variant with one-of cards, because of the existence of impossible deals

I don't see where the conflict is. It just becomes another thing to model. You can treat it either by i) from the start of the game, subtracting max score by 5 - n, where n is the rank of the bottom-deck black card; or ii) modelling players such that they stall as if trying to obtain a 25 until it is impossible to obtain by deduction. The latter is more complicated, but more realistic.

alercah commented 4 years ago

The efficiency requirement is specifically to get a perfect score. If we wanted to do the maximum score possible in a deal, though, that could work too.

On Tue., Nov. 26, 2019, 10:44 Aaron Liblong, notifications@github.com wrote:

I'm not quite clear if @Zamiell https://github.com/Zamiell means displaying the average η_req for the variant, or the η_req for the specific deal though.

Showing η_req for the deal would either be leaking information about the deck order, or it would come too late to be of any use to the team, so I think it has to be average η_req for the variant.

the average η_req isn't well-defined for any variant with one-of cards, because of the existence of impossible deals

I don't see where the conflict is. It just becomes another thing to model. You can treat it either by i) from the start of the game, subtracting max score by 5 - n, where n is the rank of the bottom-deck black card; or ii) modelling players such that they stall as if trying to obtain a 25 until it is impossible to obtain by deduction. The latter is more complicated, but more realistic.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Zamiell/hanabi-conventions/issues/104?email_source=notifications&email_token=AE7AOVOGWZL3QWRXRCUS7CLQVU765A5CNFSM4JOL3PGKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFGOY4A#issuecomment-558689392, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE7AOVMZI7ZIHQW2W23T3NDQVU765ANCNFSM4JOL3PGA .

Zamiell commented 4 years ago

btw lib did you know about / see this? https://github.com/Zamiell/hanabi-bot its a bot that plays with Hyphen-ated conventions that me and florrat started working on a few months back

i programmed some basic stuff like 2 saves and 5 saves and play clues. right now though, if you run it and copy paste the game result into the JSON import of hanabi.live, you can see the bot dosn't know what to do when it gets self-prompted, so it bombs. programming self-prompts is pretty difficult and i wasn't able to wrap my head around it so i temporarily gave up

if the bot ever gets good enough to play a decent level then we could use that to determine η_avg by letting it play 1 million games

aliblong commented 4 years ago

As a further tangent, isn't it time to stop calling these the "Hyphen-ated conventions"? He hasn't been active for literally years, and the current state of things, both in terms of conventions and playerbase, is quite far from the state at that point. I get that he contributed a lot in the early days, but to use him as the namesake of a group he's decidedly not a part of feels weirdly like canonization. Just call them the hanabi-live conventions; I don't think it's contentious to say that we're the primary group that plays there.

Zamiell commented 4 years ago

while hyphen-ated has not been active in years, i also don't think it is a great idea to use "Hanabi Live" conventions as a name. Often times I join a random table on Hanabi Live and ask the question "do you guys play with the Hypen-ated conventions?" in such a situation, it would be slightly more confusing if i asked "do you guys play with Hanabi Live conventions?", since it has the potential to be misconstrued as:

Zamiell commented 4 years ago

furthermore, such a name would softly imply that if you are playing on the Hanabi Live website, you should also probably be playing with Hanabi Live conventions, which is something i would rather avoid

aliblong commented 4 years ago

It doesn't imply that someone ought to play with our conventions; rather, it implies that if they don't, they're part of the outgroup, which is true (there are no other groups of comparable size to ours), and is already implied by the structure of our discord server. If you want to distinguish the site from our play group, the separation and rebranding should extend much further (like, to Discord).

Zamiell commented 4 years ago

It doesn't imply that someone ought to play with our conventions; rather, it implies that if they don't, they're part of the outgroup, which is true (there are no other groups of comparable size to ours)

i would prefer to avoid altogether the perception of an in-group and an out-group, even if those are the facts on the ground. in particular, a subset of BGA players may be more likely to play on the Hanabi Live website if they feel it is possible to play with BGA conventions on Hanabi Live, which of course it is.

If you want to distinguish the site from our play group, the separation and rebranding should extend much further (like, to Discord).

most hyphen-ated convention discussion happens on GitHub already, so it is already separated away. the notable exception would be the "quick questions & quick answers" channel. from a logistical standpoint, it is probably is not worth it to spin off an entirely separate discord server primarily just for a "quick questions & quick answers" chat channel.

but more to your point, i've added a new category to the discord server called "Hyphen-ated Conventions" so that it is better delineated

alercah commented 4 years ago

Renaming should be a separate issue thread.

alercah commented 4 years ago

After thinking about whether there is an analytic approach to estimating η_req, I realized that it doesn't mean what you think it means either. The problem with your definition is that not all clues that are used to modulate the last draw are stall clues, and so I don't think it's going to accurately capture what you're thinking.

The simplest scenario where modulation clues are absolutely required is when k5 and k4 are the third and second last two cards in the deck, respectively, with the last card being any duplicate. In this scenario, at least n-2 clues must be available in an n-player game at the moment that the k5 is drawn. Why?

The last turn of the player with k5 must not come before the last turn of the player with k4. Consequently, once the k4 is drawn, no more cards must be drawn until the player with k5 gets another turn. That player (even if they also have the k4) can then discard or play (if they have the k4, they must play it) and trigger the endgame.

The first play or discard after k5 is drawn will be the k4, so this means that once the k5 is drawn, at most one card can be played or discarded before that player's next turn. There are n-1 other players, so n-2 clues are required.

I think, in a scenario such as this, that what you want to do is say "So n-2 stall clues are required, and we subtract that from the required efficiency." But what's to say that all of those n-2 clues are useless? At least one of them might be likely going to go to the player with the k5, to let them know that they have it and that they should be the one to game end. That's not a stall clue. There could also be other cards, untouched up to this point, that need cluing, like other 5s. They may or may not have been gettable earlier. They can be played in the last round once there is no longer a need to modulate the deck.

So the apparent number of required stall clues in a given deal is based on how the hand plays out, and specifically we can expect to see more stall clues in a deal where the players play with higher clue efficiency. In those games, they will likely have fewer cards left unclued at the end, meaning more stalls are necessary. But in a game with lower efficiency, it's possible that all the mandatory clues are going to be productive. Taking this to the extreme, we see that we would get a paradoxical result if we simply tried to measure the average number of endgame stalls in won games: if we ignore the differences in conventions that we play with between "easy" and "hard" variants, then easier variants, where higher clue efficiency is possible, would actually appear harder because stalls would be more common, and conversely the hardest variants where victory is only eked out would appear easier. And in some cases minor preferences would muddy the data, for instance, with two remaining required clues and two cards that need to be clued, whether they're done in two 1-card clues or one 2-card clue and one stall.

But we can look deeper!

So we know that we can't draw k5 unless we have at least n-2 clues in hand. And, since it's the third last card in the deck, there can be at most 3+n cards remaining to play. If this is in fact a 6-player black game, then there are 55 cards in the deck, 30 cards to play, and 18 cards in hands. In this deal, by the time 3 cards are left, there must be at least 21 cards played, which leaves 13 cards to have been discarded. There can be at most four 5s played already, for a total of 25 clues available so far. When we subtract the ones that must be available, we conclude that by the time we reach 3 cards left in the deck, the absolute lower bound for efficiency is 23/21 in order to score 30.

Using similar methods, we can define a point-in-time minimum efficiency for every turn of the game. In a 5-player black game, by contrast, we only need two clues in hand, we've only had maximum 11 disaards up to this point, and there at most 8 plays left, not 9. Consequently we actually need a very slightly higher efficiency of 22/20. These are about 0.005 apart. Interestingly, however, this particular approach shows that in terms of strictly minimum clue efficiency, at least with a deal like this one, the 6-player variant is slightly easier than 5-player.

So this is making me think that maybe efficiency is completely barking up the wrong tree, and looking at card distribution patterns may be a much better way to do it, and in particular to try to find some general average metric, rather than an average of a deal metric.

Let's look at what happens when the deck is emptied, and assuming everyone knows the location of every relevant card because it's relatively rare that people can't work out the relevant contents of their hand by this point. In particular, let's assume that the remaining cards' positions in hands are distributed randomly. If there are 0 or 1 cards left to play, the game is won with certainty. If there's 2 cards left to play, then we win with good probability; we only lose if they're the same suit and out of order, or both in the same hand. As we have more and more cards to play, the probability of being able to win the game goes down rapidly, until we bottom out at 0 when we have more cards left than players.

But if we take modulation into account, then there's additional complexity. If the players can choose where the last draw is made, then they can improve their chances. When the remaining cards are 45 suited, modulation lets the players win every game where the cards aren't in the same hand. But in a bigger game, this will require more clues on average. We could try to quantify it, but there are additional subtleties, such as the fact that in variants with fewer players or more total cards in hand, it's often somewhat easier to modulate 2-ofs and 3-ofs to ensure good distribution.

waweiwoowu commented 4 years ago

Call them "Zamiel Conventions" 😂

Zamiell commented 4 years ago

no thx

Zamiell commented 4 years ago

@aliblong 1 week later, do we have any progress for how we want to modify the min eff formula

aliblong commented 4 years ago

I think @alercah has convinced me that there's a better approach, but we haven't found it yet. I do still think η_req as I define it would be a better general metric of game difficulty than η_min, but I'm unmotivated to pursue computing it since it's a hard problem. Maybe there's also a simple heuristic that outperforms η_min; in particular, one that depends on the number of players.

Zamiell commented 4 years ago

back to my idea then

(5 number of suits) / (8 + floor(starting pace + number of suits - unusable clues)) --> (5 number of suits) / (8 + floor(starting pace + number of suits - unusable clues - average End-Game stall clues))

e.g.

Number of Players Average End-Game Stall Clues
2-player 1
3-player 1
4-player 2
5-player 2
6-player 3
Zamiell commented 4 years ago

as an aside, note that with the above change, the formula goes from calculating a pure "minimum efficiency needed" to more of a "average efficiency needed skewed towards the lower bound", but idgaf and would prefer to continue calling it the "minimum efficiency needed" for simplicity

Zamiell commented 4 years ago

With all that said, if we want to maintain a list of "exceptionally hard games" (do we already have one? I haven't seen it

yes we do it is already in the doc https://github.com/Zamiell/hanabi-conventions/blob/master/Reference.md#hard-variants--easy-variants

Zamiell commented 4 years ago

@aliblong

aliblong commented 4 years ago

Seems fine

Zamiell commented 4 years ago

this is a lot of work so I'll likely implement this in the future at some point

Razvogor commented 4 years ago

We could call ourselves the Live-Hanabi Group to reduce possibilities of misconstruction

Zamiell commented 4 years ago

actually im going to close this for now as this is a lot of work and kind of arbitrary