sparrowwallet / sparrow

Desktop Bitcoin Wallet focused on security and privacy. Free and open source.
https://sparrowwallet.com/
Apache License 2.0
1.28k stars 184 forks source link

Consider percent-encoding bitcoin URI '=' and '?' characters for more compact QR codes #1358

Closed DanGould closed 5 months ago

DanGould commented 5 months ago

QR Codes may be encoded in Bytes mode (encoding every character as 8 bits) or Alphanumeric base45 mode (encoding every 2 characters as 11 bits.

Some of the encodings in sparrow, like UR take advantage of the more dense encoding by limiting the character set to base45 alphanumeric characters.

BIP 21 Bitcoin URIs could take advantage of this if they percent-encoded characters which are not in the QR alphanumeric base45 set. In particular '?' and '=' need to be percent-encoded, and the URI parameters need to be case-insensitive, so case-sensitive base58 addresses would be ineligible for this compression. Take this example of a bip21 with some case-insensitive payjoin v2 parameters having '=' and '?' percent-encoded.

Plain URI to Bytes mode QR code

bitcoin:tb1pjqhq9wut7d63xe0khnteylwyc93evlkkgzw8q5y3nwtepvtfj09supnwrt?amount=1&pj=https://payjo.in/pk1qw24nx0wyntm7m0r4s7cspnkdpdteufl5yvw3yzseue0wx6gxum9kemxllf&ohttp=oh1qyqzpqxu3dz27jcadlvmnk8dty00f783vtd2peqjtd0ddn89qrkkf0p4qqzqqqgqqvjm2j8g

244 chars * 16 bits per 2 chars = 1952 encoded bits

image

57x57

Percent-encoded uppercase URI to Alphanumeric mode QR Code

uppercased, percent encoded '?' and '='

BITCOIN:TB1PJQHQ9WUT7D63XE0KHNTEYLWYC93EVLKKGZW8Q5Y3NWTEPVTFJ09SUPNWRT%3FAMOUNT%3D1&PJ%3DHTTPS://PAYJO.IN/PK1QW24NX0WYNTM7M0R4S7CSPNKDPDTEUFL5YVW3YZSEUE0WX6GXUM9KEMXLLF&OHTTP%3DOH1QYQZPQXU3DZ27JCADLVMNK8DTY00F783VTD2PEQJTD0DDN89QRKKF0P4QQZQQQGQQVJM2J8G

266 chars * 11 bits per 2 chars = 1463 encoded bits before error correction

image

49x49


Note: I'm not sure of the error correction used in these codes but I assume it is the same for both since I generated both using the same "Plain Text" input mode of the same program

Hurdles

QR scanning URI parsers would have to expect percent-encoded input. If bip21 URIs are percent-encoded but a parser does not anticipate percent encoding then the URI decoding will fail. However, the bip21 spec even necessitates percent-encoding parameters in the "General Format" section, so parsers should have this capability available. @craigraw what do you think of encoding and decoding URI '=' and '?' characters this way? I figure senders with incompatible scanners could still scan a plain, non-bip21 URI, address as fallback.

Edit: I realize that nested percent encoding '=' and '?' exactly as they would be in params may cause issue, but the same idea could still encode those 2 characters within the base45 set could achieve the same QR compression.

craigraw commented 5 months ago

Thanks for the detailed proposal.

I think it's import to note that such URIs would be in contradiction to RFC 3896 on which BIP21 is based. Specifically section 2.2:

reserved = gen-delims / sub-delims

  gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" / "@"

  sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
              / "*" / "+" / "," / ";" / "="

The purpose of reserved characters is to provide a set of delimiting characters that are distinguishable from other data within a URI. URIs that differ in the replacement of a reserved character with its corresponding percent-encoded octet are not equivalent. Percent- encoding a reserved character, or decoding a percent-encoded octet that corresponds to a reserved character, will change how the URI is interpreted by most applications. Thus, characters in the reserved set are protected from normalization and are therefore safe to be used by scheme-specific and producer-specific algorithms for delimiting data subcomponents within a URI.

We might consider that the savings are worth the spec violation. For ? (%3F) this seems to work - Bitcoin addresses do not contain this character, so as long as BIP21 is followed and a Bitcoin address is always present we should be ok. But what about = (%3D)? If a value in the BIP21 URI contains this character (for example, the label field) then there would be no clear way to parse it.

DanGould commented 5 months ago

On second thought, percent-encoding would not work because that would conflict with parameters' percent-encoding. It's not impossible for one of the parameters to include a '?' or '=' so long as they're percent-encoded.

Therefore, a separate encoding would be required for the URI '=' and '?' elements. Something rarely used like '%7F' for '=' (otherwise typical for (DEL)) and '%8C' for '?' (otherwise typically Œ) could work.

But honestly this seems like a hack and it seems like it would be hard to get implementations to switch for slightly more compressed QR codes.

The reason I brought this up in the first place is because bip77 payjoin v2 reviewers suggested I not encode public keys with base64Uri since that scheme is case insensitive, and therefore forces QR encoders to use bytes mode. However, it seems like bip21 QR codes are always in bytes mode anyway because of the '=' and '?' delimiters. I still may choose to use bech32m instead for these payjoin v2 bip21 params values (since they're case insensitive and checksummed) but wanted to exhaust the possibility for bitcoin URIs to be alphanumeric mode QR encoded if possible. If alphanumeric QR encoding of bip21 uris were widespread, it would be a hard requirement for bip77 params not to break that, but I don't think alphanumeric QR encoding of bip21 uris is happening anywhere.

craigraw commented 5 months ago

I haven't see QR scanning issues with BIP21 URIs - maybe the situation is not actually problematic?

Closing this off as an active issue.

DanGould commented 4 months ago

I found out that QRs can use a mixed mode where certain parameters are encoded separately from the main data string and doing so can save significant space. I'm not sure if sparrow is doing this, but that strategy, tested by the Bitcoin Design Community, seems to achieve the same goal as this issue intended.