vorakl / base94

A reversible binary-to-text encoding using a variable alphabet in the range [2..94]
https://vorakl.com/articles/base94/
MIT License
19 stars 1 forks source link

Please explain the rationale behind the current alphabet #1

Open mimi89999 opened 1 month ago

mimi89999 commented 1 month ago

Hello,

I noticed that the alphabet you defined uses all printable characters except space. Why is that? Some very problematic chars like backslash or doublequote that will need to be escaped in JSON and many programming languages. Why were they included in the alphabet?

vorakl commented 1 month ago

I explain the rationale in the "The key problem" section of the https://vorakl.com/articles/base94/ article.

Basically, this encoding is a solution to a different problem than "convenient embedding in JSON". Base94 uses all printable characters, which have the same ASCII codes in all character sets, which explains why no whitespace is included (you won't see any visible difference between ASCII 32 and 10, for example). So it is limited to 7-bit codes only, excluding all codes that don't have a printable symbol.

The main goal - to extend the alphabet as much as possible, but limit it to the point where it's supported everywhere, all the time. For the case you mentioned, Base64 is widely used, which has the alphabet limited by 6 bits and carefully chosen characters, with a small adjustments https://datatracker.ietf.org/doc/html/rfc4648#page-7 to the original set.

vorakl commented 1 month ago

For more information about context, see another article, https://vorakl.com/articles/stream-encoding/