iancoleman / bip39

A web tool for converting BIP39 mnemonic codes
https://iancoleman.io/bip39/
MIT License
3.43k stars 1.42k forks source link

Bias in dice based entropy #435

Closed crwatkins closed 3 years ago

crwatkins commented 3 years ago

I believe there is a bias in the diced based entropy entry method (perhaps in some of the other non-base 2 methods also). I believe that the code is using the dice to create a base 6 number and then simply discarding some bits (base 2) to create the entropy to be used for the seed. This results in an uneven distribution in the result. There are various methods to deal with this bias.

NIST addresses a similar problem (NIST is going from modulo 2 entropy to module N while this is going from modulo N to modulo 2) in their Digital Signature Standard (section B.1.1) by requiring an additional 64 random bits to minimize the bias. In our case, that would mean about an extra 25 rolls of the die.

One method to remove the bias would be to simply ignore throws of 4 and 5, and use two bits of entropy from throws 1, 2, 3, and 6. This would provide an average of 1.33 bits per roll.

A slight variation on that method is that instead of discarding 4 and 5, use one bit of entropy from 4 and 5 in addition to the two bits from 1, 2, 3, and 6. That results in an average of 1.66 bits per roll. This is obviously more efficient than the previous method, but I mentioned both because the previous method is easier to think about to start with.

The above method allows deriving discreet entropy from a single die roll. This can also be applied in conjunction with aggregation (creating a larger base 6 number as is done now) for differing efficiencies (the efficiency varies based on the difference between the power of 2 and the power of 6 being used).

One other potentially related issue: There may be some other errors in the entry or display process. Entering the single digit 5 in base 6 mode claims to provide 3 total bits, which is not possible.

iancoleman commented 3 years ago

This results in an uneven distribution in the result.

Is this unevenness exploitable when generating a useful mnemonic (ie 15+ words)? I understand the unevenness effect, but I don't understand the magnitude of the effect, as in, how dangerous is it and when can we accept-and-ignore the effect? Just my shortcomings in mathematics showing here, I don't know how to quantify the effect.

I can see how discreet entropy from a single roll is nice to have, but if it means doing 93 rolls instead of 62 to generate a 15 word mnemonic then I'm not sure I can see the benefit.

Can you please explain in more detail when you think it's appropriate to switch between the proposed discreet entropy method and the existing aggregation method?

Entering the single digit 5 in base 6 mode claims to provide 3 total bits, which is not possible.

5 in base6 is 101 in base2 so I'm not sure why you expect only 2 bits. I guess floor(bitsPerEvent * events) can be used, but I don't feel it's misleading to have 3 bits shown when it's 2.58 bits per event. It seems very natural to me that 58% of the time it will be n+1 bits and 42% of the time be n bits.

crwatkins commented 3 years ago

Perhaps I should explain why I opened this issue. A wallet developer asked me whether they should switch from their current method of user generated entropy input to the method used on this project. I quickly reviewed both of them and advised them to stick with their current method since 1) their current method had no bias and 2) the method in this project does have a bias and 3) NIST discusses concerns about such biases when generating DSS keys. I also promised to report my observation here; thus this issue.

I could probably quantify the modulo bias on a single or double die roll. However, I'm no mathematician and I would have a hard time quantifying the entropy bias on a few rolls, let alone enough rolls to generate ECDSA keys. As for the exploitability question: I'm so dull that I constantly lose debates with myself on whether 128 bits of (good, unbiased) entropy is sufficient for the foreseeable future. I'm also basically lazy, so sidestepping the hard math and eliminating the bias tends to appeal to me.

As I mentioned above I believe that a single die roll can provide an average of 1.66 unbiased bits of entropy. A two dice roll might be able to provide an average of 2.33 bits per die per roll (pretty close to the theoretical maximum of log 6/log2 = 2.58). I believe three dice is less efficient than two, and beyond that I have no idea.

iancoleman commented 3 years ago

It seems very natural to me that 58% of the time it will be n+1 bits and 42% of the time be n bits.

Turns out I was wrong about this. I made a bit of an exploratory tool at https://iancoleman.io/entropy_bias_calculator/ which explains the portion of times to expect floor(n) bits vs ceil(n) bits.

There does appear to be some bias in the first few bits of entropy when rolling a D6. The amount of bias seems to depend on a few factors like the fraction of leftover entropy and the choice to align to the most-significant or least-significant bit. But all other bits seem to be fair. For example, when generating 160.268 bits of entropy with 62 rolls of a D6 bits 0-2 will not be evenly distributed, but bits 3-160 will be evenly distributed.

I'll keep going further into this but so far the experimental results don't make me feel there's a significant issue here for any practical use of the tool when generating 160+ bits of entropy.

Thanks for raising the issue, it's an interesting one.

crwatkins commented 3 years ago

For example, when generating 160.268 bits of entropy with 62 rolls of a D6 bits 0-2 will not be evenly distributed, but bits 3-160 will be evenly distributed.

I believe that is incorrect. 62 rolls of the D6 can yield a number from 0 to 6^62 - 1. 6^62 = 2^62 * 3^62 so that means that you can get 62 evenly distributed bits. b0 (the LSB) through b61 are evenly distributed. Bits b62 - b159 are biased (some of them may have an extremely small bias). Note that b62 must be biased since the number 3^62 (the number of values left after dividing by the 62 evenly distributed bits) is odd because any power of an odd number is always going to be odd. Further, since there is an odd number of values, it is impossible to have the same number of zeros and ones in any power of 2 bit position, therefore we know for certain that b62 to b159 are biased to some degree (I believe the higher order bits will in general have more bias than the lower order bits).

As I said above

I could probably quantify the modulo bias on a single or double die roll. However, I'm no mathematician and I would have a hard time quantifying the entropy bias on a few rolls, let alone enough rolls to generate ECDSA keys.

So here is an example with two dice rolls. It yields a number 0 to 6^2 -1 or 0 to 35. If it helps write out all the binary values, or maybe just the values above 31. b0 and b1 are unbiased, because as above, 6^2 = 2^2 * 3^2 However you'll notice b2 - b4 are biased. All three of those bits are about 10% more likely to be a zero (because of the modulo bias) than a one. Since 3 > 2 I believe you will always have more biased bits than unbiased bits with a D6 die, no matter how many rolls, using the current method.

The engineer in me tends to avoid any bias at all (see above for recommended methods).

iancoleman commented 3 years ago

you can get 62 evenly distributed bits. ... Bits b62 - b159 are biased (some of them may have an extremely small bias). Note that b62 must be biased since the number 3^62 (the number of values left after dividing by the 62 evenly distributed bits) is odd because any power of an odd number is always going to be odd.

Yes this is a handy insight. I'll try to add the theoretical bias on each bit to the calculator.

I believe the higher order bits will in general have more bias than the lower order bits

Yes, the higher order bits are especially biased. This is an example of rolling 62 D6, collecting data over 100K samples, and it clearly shows the higher order bits are significantly more biased than the others.

entropy_bias_62_d6_100K_samples

Looking at it experimentally, 62 rolls of a D8 with exactly 3 bits of entropy on each roll, doing 100K samples gives the worst-case biased bit at 50.355% ones. Comparing the same situation but a D6 gives 8 bits outside that range, 7 of which are the most-significant bits. 8/160 is 5% of bits are more biased than the D8. Not sure if this is the right way of looking at how good or bad the bias is overall, or how it affects end users rolling their D6. I can agree with your sentiment "the engineer in me tends to avoid any bias at all", but I want users to be able to do the simplest thing that works, and rolling 1.5x more D6 seems like a big change for a small problem.

I'll keep pursuing this further, both to quantify the risks of bias and consider the best way to account for it. For example, in the case above there were 8 of 160 bits with 'too much' bias, so there are 152 bits that fall 'within the range of unbiased variation' and I might be comfortable to say they have generated 152 fair bits of entropy (which exact bits though...?!) It's not really as simple as this, every bit has some bias and when accumulated it may be overall more than 8 bits worth of bias, so I'll keep working toward quantifying this accurately. I think in the end it will be reported as something like "x% of bits would have no more than y% of bias". Whatever it is, it has to be easy and unmistakable for users to understand.

Thanks again for the discussion, it's really helpful.

crwatkins commented 3 years ago

Recommended reading

How to best obtain bit sequences from throwing normal dice?

Why do people say there is modulo bias when using a random number generator?

Creating a small number from a random octet string

I'll keep pursuing this further, both to quantify the risks of bias and consider the best way to account for it.

That's fairly straightforward. The measurement of this unpredictability is entropy.

What is entropy?

I can agree with your sentiment "the engineer in me tends to avoid any bias at all", but I want users to be able to do the simplest thing that works, and rolling 1.5x more D6 seems like a big change for a small problem.

You know what they say? "You can't get entropy from a stone." Actually, nobody says that, but seriously, just because bits were generated, it doesn't mean that entropy exists. Throwing a die 62 times gives you (at least) 160 bits, but it does not give you 160 bits of entropy, just like converting the first 20 ASCII characters of this sentence into binary creates 160 bits, but not 160 bits of entropy (this comment assumes you actually want 160 bits of entropy).

kristovatlas commented 3 years ago

FWIW I agree with @crwatkins -- you want to generate 160 bits of entropy, not just 160 bits of anything. I would prefer to roll more dice.

For an average GitHub project it's kinda meh whatever, but you have the unlucky position of having created a reference implementation.

iancoleman commented 3 years ago

I can see two options for improving the tool

Option 1

Remove D6, add D4 and D8, add a note about bias in the More Info section and a link to it in the entropy section (Base 10 and Card entropy is still biased so a note is needed).

Option 2

Remove the option for Use Raw Entropy when using a biased entropy source, and select the appropriate X Words option which will use the hash of the entropy instead. The hashing should normalize any bias. Use Raw Entropy will only be available for Binary and Hex entropy.

I prefer option 2 since it resolves bias in any base, not just D6. @crwatkins what are your thoughts? Any other options I haven't considered?

limpbrains commented 3 years ago

Option 3

Use this method to convert any dice entropy to bytes.

D1 -> 00
D2 -> 01
D3 -> 10
D4 -> 11
D5 -> 0
D6 -> 1

Take a look at getEntropy the function which I built for BlueWallet (tests)

The same principle can be applied for a deck of cards or even unexisted D15

crwatkins commented 3 years ago

I agree with @kristovatlas and I suspect that developers may look to this project for best practices.

Option 1: I would rule that out because I can't find where I put my D4 and my D8. Option 2: I think that would be fine. I would recommend PBKDF2 with 2048 and HMAC-SHA512 as in BIP39. This could also work for cards, etc. on the text generated (e.g.: 3S7D...) Option 3: I think this would be fine also (as I described in my original comment).

Option 2 is more generic and should work for most methods (but is based on some assumptions about SHA that we have all long ago accepted, including BIP39) Option 3 is more precise

jsarenik commented 3 years ago

Thank you for great discussion on this topic! I thought exactly about it early this year and you can see how I fought with it in POSIX shell. In my tests I "threw" the dice many thousand times and checked how many times each number was thrown. Just my 5 msat.

iancoleman commented 3 years ago

I implemented option 3 in https://github.com/iancoleman/bip39/commit/bf96267f89d18f278e78cf02c97ab1e7513fb871

The binary for each event is directly mapped in entropy.js L19-148

Overall this simplified the entropy logic a lot.

wigy-opensource-developer commented 3 years ago

I do not want to ruin the party, but when you release this backward-incompatible change, could you also provide a link around the entropy field to an old release, so if someone has a stack of cards they could ger back to their old seed? I doubt people store a set of dices though.

crwatkins commented 3 years ago

I know this issue relates to the dice based entropy, but I'll point out that bf96267 seems to make the assumption of card replacement (each new card is drawn from a full (52 card) randomized deck). I have not checked how this has worked in the past, but I assume at least some users might try to apply this to a deck dealt without replacement.

iancoleman commented 3 years ago

could you also provide a link around the entropy field to an old release

Yes, will do.

bf96267 seems to make the assumption of card replacement (each new card is drawn from a full (52 card) randomized deck)

Good point. Without replacement gives a full deck of log2(52!) ~ 225.6 bits and the new way is 325+164+4*2 = 232 bits.

Open to any suggestions on this.

crwatkins commented 3 years ago

Good point. Without replacement gives a full deck of log2(52!) ~ 225.6 bits and the new way is 325+164+4*2 = 232 bits.

Open to any suggestions on this.

As I mentioned above, I think option 2 (PBKDF2) would work for this. A downside would be the inconsistency with the other input methods. Option 3 only works for replacement. I'm not sure what the original card deck problem statement was, but I'm assuming it didn't involve replacement after each draw.

limpbrains commented 3 years ago

I think Option 3 can be used in case when you're pulling cards from deck one by one. It is just each time you need to reduce Base number by 1 and recalculate entropy value for each card. At the end, when you will have only two cards, each of them will represent either 0 or 1. @crwatkins please correct me if I'm wrong.

It should be clear from the documentation which method is used. If there is no replacement, input can be verified for doubles.

crwatkins commented 3 years ago

@limpbrains (As I've mentioned before) I'm no expert in the mathematics here, but I believe that you are correct in that you need to adjust each time a card is removed. When I said

Option 3 only works for replacement.

I was referring to the implementation of Option 3 for a card deck in bf96267.

iancoleman commented 3 years ago

There's some extra explanation around card entropy, see https://github.com/iancoleman/bip39/commit/8c3a56ec4f2af06ce2e0977d458c50fd7b1b7687

Also a link to the previous version 0.4.3 for anyone storing a deck of cards to recreate their mnemonic (not recommended but people will do it so can't hurt to add the link).

The difference in a full deck with / without replacement is made clear. Both instances give 21 words.

I'll close this issue but am open to further comments.

eugenesan commented 2 years ago

@iancoleman @crwatkins

Recently, I was tweaking Coldcard's implementation of the Dice6 to Mnemonic tool (https://github.com/Coldcard/firmware/blob/master/docs/rolls.py which by the way mimics it's firmware implementation) and I think I've discovered a potential issue which also affects bip39 tool.

The issues is related to using "upscaled/streched" entropy using SHA256. No issues with using RAW entropy (3 words per 32bits).

  1. bip39 tool, Coldcard's rolls.py and my variant of rolls.py, somehow generate identical seeds/mnemonics. But, those tools pass different input to sha256 and get identical results. If you look at https://github.com/Coldcard/firmware/pull/110/commits/aae0e7a322c27f6c3902cb178382edff5aad1ae0 (Line: 33) you can see that I am replacing digits 6 with 0 (similarly to what bip39 tool does) but it somehow doesn't affect the result.

  2. While researching the first issue, I have stumbled up on https://crypto.stackexchange.com/questions/10402/how-much-entropy-is-lost-via-hashing-when-you-add-known-or-low-entropy-data/10405#10405 and other similar statements. It seems like passing input entropy via sha256 significantly reduces output entropy (up to 40%). As result we can never reach required entropy level to generate strong seed/mnemonic. So while we try to compensate for input bias using sha256, we might be harming the entropy even more.

  3. (This one might be related to the first issue) When using "upscaled/streched" entropy, we pass input (after initial filtering such as replacing "6" with "0") as ASCII to SHA256. In that case, in addition to previous issues, we utilize only a 16/265th (at the most for Hex input) of each input byte.

Now, I am not mathematician and I really hope I am missing something trivial and someone will correct me. But if I am right, we might have a serious problem.

My current thinking is to disable usage of upscaled/streched" entropy until someone will convince us that it is a legitimate method.

crwatkins commented 2 years ago

@eugenesan I hope I don't come across here as curt, but I am having a hard time providing concise feedback to what might be general misunderstandings.

  1. For practical purposes, that is not possible. If a tool is indeed passing different short strings (strings which can be typed by a human) into sha256 and getting a collision, simply document those strings and become famous. That's not what is happening and to be blunt, this is very likely a bug in your code.
  2. That's not what the stackexchange answer says. It explains there could be less than one bit of entropy loss due to collisions. I agree with the stackexchange answer. One bit is not a practical issue.
  3. The point of such a tool is to do exactly that: Accumulate multiple small sources of entropy.

I see no cause for concern.

eugenesan commented 2 years ago

@crwatkins Thanks for responding.

  1. After reading your response I did another pass of testing and indeed, you are correct. Seems like Coldcard's implementation is not 100% compatible (it is if the input lacks digits 6) and I mistakenly prepared bad dataset for testing based on that incorrect assumption. No more late night coding for me ;-)

  2. While 1bit is maybe not a reason for concern but wouldn't it be better (even if marginally) to skip SHA256 if the input is 256bit or greater?

  3. While SHA256 is more or less a "blackbox" to me, I still don't like the idea of feeding any algorithm expecting 256 variations for each byte of input with input where each byte provides only ~5% of that. I'd be more comfortable with converting input to integer before passing it to SHA256.

crwatkins commented 2 years ago

@eugenesan

First and foremost, this (closed) issue relates to a bias in dice entropy and the topic which is now being discussed is only tangentially related. In addition, this project (by default) uses raw entropy (which I recommended above), so I'm not sure that the continuation of this thread relating to how other projects perform this operation is as useful here as it would be elsewhere with a more appropriate audience. If you have a suggestion for a change to this project, I suggest you open a new issue or submit a PR. If you want to ask cryptographic questions, I would recommend StackExchange as you linked above.

However, I don't want to "leave you hanging", but I'll try to resist pontificating at length on all the tradeoffs of such designs. Decades ago, when IBM dominated the corporate computing market, a well known adage was "Nobody ever gets fired for buying IBM." Well today, "Nobody ever gets fired for throwing in another hash." To relate that to this original issue with dice, users do not have perfectly unbiased physical dice (it's not "if" they are biased, it's "how much"). Hashes today are very fast and have the potential to mitigate some types of potential exploits of issues such as bias in entropy.

gpatkinson commented 1 year ago

@crwatkins @iancoleman @eugenesan One other question on the topic of dice roll math ... how important is it to use base6 vs base10 in the generation of binary entropy? When comparing this repo's implementation to others I noticed that some others are using base10. This seems technically incorrect but I'm not math-inclined enough to determine if it is actually a problem from a seed generation standpoint.

crwatkins commented 1 year ago

I'm not certain what you mean by "base6 vs base10". If you mean taking digits 0-5 or 1-6 from a D6 roll and concatenating them together (such as "3522134") and treating that as a base10 number there will be significantly less entropy bits than the number of bits in the number. This issue discussed how treating that number as a base6 number will provide some bias. Treating it as a base10 number would provide even more bias.