multiformats / multibase

Self identifying base encodings
286 stars 74 forks source link

Suggestion: split the `code` column into `code_char` and `code_ascii` #77

Closed gjvnq closed 3 years ago

gjvnq commented 3 years ago

Reading the table, it can be a bit confusing if the code field is a char or an integer or, even worse, an escape code (like \t and \b).

I think that reformating the table in the following manner can avoid confussion.

encoding,          code_char, code_ascii, description,                                              status
identity,          <NUL>,     0x00,       8-bit binary (encoder and decoder keeps data unmodified), default
base2,             0,         0x30,       binary (01010101),                                        candidate
base8,             7,         0x37,       octal,                                                    draft
base10,            9,         0x39,       decimal,                                                  draft
base16,            f,         0x66,       hexadecimal,                                              default
base16upper,       F,         0x46,       hexadecimal,                                              default
base32hex,         v,         0x76,       rfc4648 case-insensitive - no padding - highest char,     candidate
base32hexupper,    V,         0x56,       rfc4648 case-insensitive - no padding - highest char,     candidate
base32hexpad,      t,         0x74,       rfc4648 case-insensitive - with padding,                  candidate
base32hexpadupper, T,         0x54,       rfc4648 case-insensitive - with padding,                  candidate
base32,            b,         0x62,       rfc4648 case-insensitive - no padding,                    default
base32upper,       B,         0x42,       rfc4648 case-insensitive - no padding,                    default
base32pad,         c,         0x63,       rfc4648 case-insensitive - with padding,                  candidate
base32padupper,    C,         0x43,       rfc4648 case-insensitive - with padding,                  candidate
base32z,           h,         0x68,       z-base-32 (used by Tahoe-LAFS),                           draft
base36,            k,         0x6b,       base36 [0-9a-z] case-insensitive - no padding,            draft
base36upper,       K,         0x4b,       base36 [0-9A-Z] case-insensitive - no padding,            draft
base58btc,         z,         0x7a,       base58 bitcoin,                                           default
base58flickr,      Z,         0x5a,       base58 flicker,                                           candidate
base64,            m,         0x6d,       rfc4648 no padding,                                       default
base64pad,         M,         0x4d,       rfc4648 with padding - MIME encoding,                     candidate
base64url,         u,         0x75,       rfc4648 no padding,                                       default
base64urlpad,      U,         0x55,       rfc4648 with padding,                                     default
vmx commented 3 years ago

Thanks for the proposal, this part indeed is confusing.

The reason why there are no ASCII codes in the table is, that the Multibase prefixes are really about characters and thus depend on your string encoding. Practically it often matches an ASCII code as ASCII compatible encodings are widely used. Though if you look at the note in the README:

NOTE: Multibase-prefixes are encoding agnostic. "z" is "z", not 0x7a ("z" encoded as ASCII/UTF-8). For example, in UTF-32, "z" would be [0x7a, 0x00, 0x00, 0x00].

It explains the problem. The prefix is not always a single ASCII character.