sipa / bech32

Code snippets and analysis of the Bech32 format
191 stars 107 forks source link

Possibly incorrect bech32m test case #62

Closed bbrtj closed 2 years ago

bbrtj commented 2 years ago

This test case is listed in BIP-350 as a valid bech32m: 11llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllludsr8

However, it does not appear to obey the rules of 5-to-8 conversion of bech32. Beside its HRP and checksum, it contains 82 l characters, which are each converted to number 31 (binary 11111). Since each character will be interpreted as five bits, the result of the conversion is 410 1 bits.

This results in 2 1 bits padding, which does not meet the requirements of this sentence from BIP-173: Any incomplete group at the end MUST be 4 bits or less, MUST be all zeroes, and is discarded.

Please let me know whether the test case is wrong, or is the way I think about it is wrong. Thank you

sipa commented 2 years ago

The 8-to-5 rule is part of the segwit address rules, not the bech32 or bech32m rules.

So you would be right if this string were claimed to be a valid BIP350 address. It is not; it is just a valid bech32m string (only a subset of which are valid segwit address). This is similar to how not every valid base58check string is a valid address.

bbrtj commented 2 years ago

Ah okay, I got it backwards then. Does that mean that there's no defined way to encode arbitrary byte data in bech32? So basically it should be decoding each character as a separate 5 bit number?

sipa commented 2 years ago

No, not really. bech32(m) are mechanism for encoding a HRP, and a data part, which consists of a list of 5-bit values. How you choose what those 5-bit values are is up to the application. You can use the 8-to-5 mechanism from the segwit address spec, or not. In particular, you could have a 5-bit version number like segwit addresses have, but you could also use something else.

bbrtj commented 2 years ago

Okay great, thank you very much for the explanation!