JRFC 17 - <base>b<digits> -- ascii string encoding base prefixes

jbenet / random-ideas

random ideas

juan.benet.ai

324 stars 12 forks source link

JRFC 17 - <base>b<digits> -- ascii string encoding base prefixes #17

Open jbenet opened 10 years ago

jbenet commented 10 years ago

When ascii encoding numbers of different bases, we either (a) signal elsewhere what the base is (non self describing and to be avoided), or (b) prefix the characters with a base identifier:

0b010101 binary
0xabcdef hex

But what about other bases? base32, base64, base58 are all popular. What about this encoding:

<base in decimal>b<digits in base>
2b010101 binary
16babcdef hex
32b123wer base32
58b01AaBb base58
64bAaBb-_ base64

flexible to any base
completely self-describing
no need to maintain a well-known table of identifiers
(still need to maintain a table of alphabets)

jbenet commented 10 years ago

The need to maintain a table of alphabets may be a reason to use different prefixing entirely. In a sense, 0b and 0x aren't base encodings, but alphabet encodings. The ambiguity when dealing with multiple alphabets (e.g. as in base64 and base58) suggests that since we are standardizing alphabets anyway, we might as well give them prefix identifiers.

kmag commented 9 years ago

Erlang does this already, up to base 36: 2#100 == 4 10#100 == 100 36#100 == 1296

Erlang Manual section 3.2

3.2 Number

There are two types of numeric literals, integers and floats. Besides the conventional notation, there are two Erlang-specific notations:

$char ASCII value or unicode code-point of the character char.

base#value Integer with the base base, which must be an integer in the range 2..36. In Erlang 5.2/OTP R9B and earlier versions, the allowed range is 2..16.

jbenet commented 9 years ago

Oh cool! :+1:

hmm the problem with # is it wont work in alphanumeric fields (which are more common than fields with numbers and symbols).