multiformats / multibase

Self identifying base encodings
283 stars 74 forks source link

Adding Base45 (aka ISO/IEC 18004:2006 Alphanumeric Mode QR code) #64

Closed ChristopherA closed 3 months ago

ChristopherA commented 4 years ago

There is a special compression mode in the ISO QR code ( ISO/IEC 18004:2006 ) standard called "Alphanumeric Mode". When you stay within this limited character set, it will do its own compression and error correction (~44% or more).

From QR code standard ISO/IEC 18004:2000 §8.3.3

Alphanumeric Mode Alphanumeric Mode encodes data from a set of 45 characters, i.e. 10 numeric digits (0 - 9) (ASCII values 30HEX to 39HEX), 26 alphabetic characters (A - Z) (ASCII values 41HEX to 5AHEX) , and 9 symbols (SP, $, %, *, +, -, ., /, :) (ASCII values 20HEX, 24HEX, 25HEX, 2AHEX, 2BHEX, 2D to 2FHEX, 3AHEX respectively). Normally, two input characters are represented by 11 bits.

The advantage of encoding binary data in Base45 in QR code related scenarios is that the encoding does not require additional compression or checksums. But you do have to not use lower-case letters or other non-allowed characters, which can force you into the much less efficient QR binary mode.

I would like to see this mode added to multihash. I'm not sure if it is allowed for multiphase but ASCII 45 - is both allowed in a URL and is in the base45 character set. Second choice would be 4 or 5, Otherwise some other available unreserved multihash character prefix is fine.

As this base45 is already an international standard, used not only by QR codes but also some fairly obscure things like satellite radio, I think it would be a good addition.

My particular usecase is for encoding cryptographic keys and signatures to be used in air-gapped offline QR code based scenarios, such as #LetheKit

Let me know if this would be considered something I should create an initial PR for of if the maintainers of this repo want to add it.

-- Christopher Allen

Stebalien commented 4 years ago

This seems like a well motivated addition.

In terms of prefix, - is fine. It conflicts with shell flags but that shouldn't be an issue in practice. However, I'm tempted to just use SP given that it's legal and using SP will quickly catch any calls to strip/trim.

ChristopherA commented 4 years ago

SP? space?

Stebalien commented 4 years ago

Yes, I assume that's what the RFC means by SP.

wolfmcnally commented 4 years ago

I'd prefer a non-whitespace character. How about '4' given that so far the registry contains no base 4x formats?

sg495 commented 2 years ago

There is also a standards track internet draft called "The Base45 Data Encoding", which is similar but supports encoding of arbitrary binary data. I would favour using that for base45 over ISO/IEC 18004:2015. (Also, the ISO/IEC standard is paywalled, afaik.)

davidlehn commented 3 months ago

Note that "The Base45 Data Encoding" was published as RFC 9285 in Aug 2022: https://datatracker.ietf.org/doc/html/rfc9285

davidlehn commented 3 months ago

There's now a PR to add a mapping to the RFC 9285 encoding: https://github.com/multiformats/multibase/pull/123 It's using R, perhaps for "QR code" I hear, since Q is taken and it's not a big deal what char is used. Bike shed it over there if you need to. I think any letter or 4 is the most practical for this use case, and R is fine.