payjoin / rust-payjoin

Supercharged payment batching to save you fees and preserve your privacy
https://payjoindevkit.org
85 stars 37 forks source link

Improve QR code encoding efficiency #389

Open nothingmuch opened 5 hours ago

nothingmuch commented 5 hours ago

In order to encode smaller QR codes, the following changes could be made to the BIP 21 URI and the nested pj URI. All of these ideas are breaking changes and increase spec & implementation complexity.

  1. Encode bit vomit in a format amenable to alphanumeric encoding mode of QR codes. Base45 (RFC 9285) is information theoretically optimal considered in isolation, but would preclude the next optimization, due to lack of reserved delimiter characters. and the space character potentially requiring %-escaping (note that % is included). The bech32 alphabet could be used, but I would suggest against bech32's error correction as that would negate some of the benefit and would serve no purpose for something like the ohttp parameters. Base32 is standardized, and base36 semi-standardized. Afaik no base42 standardization exists, but in principle to avoid space and reserve two delimiter characters, this would provide the most efficient encoding of the fragment. At 5.5 bits per character, with each character encoding 5 bits in e.g. base32, the overhead is 10%, whereas with 8 bits per byte, and each byte encoding 6 bits in base64, the overhead is 25%. The mode overhead of 13+16=29 bits (alphanumeric and back to bytes) is ~8% overhead for 45 bytes of base64 encoding 256 bits of entropy, so the potential savings is at least 5% per 256 bit chunk of bit vomit (pardon the mixed metaphor ;-), but could be closer to 15% if mode switching overhead is paid only once for all of the fragment encoded data covering both the receiver key and the ohttp config.

  2. Instead of encoding the fragment parameters in the same format as query parameters, with & and = as delimiters, use delimiters from the alphanumeric encoding mode character set. : seems like an intuitive substitude for =, and + could be an intuitive one for &. These can be encoded without requiring URI escaping at the BIP 21 level, saving 2 characters per byte (% encoding is itself encodable as QR alphanumeric so this is not strictly needed for eliminating the need for switching back and forth between byte and alphanumeric modes for each fragment parameter, so in either case the entire fragment can be encoded in a single alphanumeric run, but % encoding is almost as expensive as 2 mode switches, requiring 16.5 bits in alphanumeric mode as opposed to 29 bits for bytes+alnum mode switch). This would also require uppercasing the parameter names encoded in the fragment as well. A minor benefit could be obtained by also encoding the exp field as alphanumeric in this case.

  3. Matt Corrallo brought up the possibility of encoding a 128 bit symmetric encryption key in the fragment, omitting the OHTTP config and receiver public key from the BIP 21 URI, instead relying on the shared secret to derive a subdirectory ID from which these data can be retrieved. This complicates key consistency and adds a round trip but potentially reduces the bit vomit entropy by ~75%. With additional interactivity the BIP 21 entropy could be reduced even further using a PAKE.

nothingmuch commented 4 hours ago

For those following by email, note that I edited the original issue text for correctness and clarity, the most important correction was in regards to mode switching and its interaction with % escaping (% is in the alphanumeric set so as long as the rest of the data is alphanumeric that does not require switching modes, which item 2 previously implied).