sveljko / base41

Base41 encoding
MIT License
8 stars 1 forks source link

Information about QR Code use for documentation #3

Open PhMajerus opened 3 months ago

PhMajerus commented 3 months ago

Base41 could be the optimal format to store binary data in URIs optimized for QR Codes.

QR Codes in alphanumeric mode can encode all the characters allowed in URIs (RFC 3986): ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-._~:/?#[]@!$&'()*+,;= However, alphanumeric/ASCII QR code really only encode 45 characters: 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ␣$%*+-./:. All lowercase require twice as much space for a preceding shift character. This is why the HCERT-designed (WHO Electronic Health Certificate) Base45 use that exact alphabet of 45 characters. But they really did it in a hurry and apparently were not familiar with binary-to-text encoding optimizations nor QR Code software, as they generate codes that require a specific scanning library. They cannot be used to reliably link directly to web sites or apps (URIs).

They could have achieved the same density with a Base41 encoding, and that would have made it possible to avoid dangerous characters, primarily the % that many generic QR Code readers expect to be percent-encoding, and + that can be an escaped space. Both may be unreliably decoded and processed by some QR scanning utilities designed to expect URLs.

With a 41-characters alphabet, it is possible to limit it to the URI (unreserved + reserved, not the more limited URI component / unreserved) set : 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ$*-.:, keeping the / available as a fields separator or padding to append several base41-encoded values, and avoiding the + completely. While not completely safe for a querystring component (depending on how you handle those), it is safe for embedding in a URI or URL, which means it makes it possible to have a URI that specifies a web site or an app URI with binary data appended to its registered pseudo-protocol. I believe this makes it perfect both for trackable web sites links (with some binary data attached), and apps shortcuts URIs with the state information appended in binary (for example, a music player could have a playlist UUID encoded in binary to launch that specific playlist). This seems like a good way to generate shorter web and app links optimized for QR codes, which in turn results in smaller QR Codes.

In short, I believe Base41 may be the optimal format to generate short URLs and URIs for use in QR Code links that can be read directly by phone's built-in code scanning utilities.

sveljko commented 1 month ago

Botta and Cavagnino, in their paper on a Base41 variant, also explore the idea of having an explicit alphabet and embedding such encoded Base41 data in URI/URL, but their design goals are different (thus they came up with a different alphabet).

I'm not fond of variants 😄, but this use case seems very interesting. Being the second "bring your own alphabet" variant of note, it seems that practicality trumps fondness in this case and adding "BYOA" variant to spec and code is warranted.

If you're OK with it, I'll cite your "QR code alphabet".

sveljko commented 1 month ago

Please take a look at the latest commit https://github.com/sveljko/base41/commit/1734a82b5b21b7c805269db0a65777f0c0a2192d