keybase / saltpack

a modern crypto messaging format
https://saltpack.org/
BSD 3-Clause "New" or "Revised" License
989 stars 62 forks source link

Why base62 has been used for encoding? #85

Closed marcofranssen closed 4 years ago

marcofranssen commented 4 years ago

I found following post on Stackoverflow. https://stackoverflow.com/questions/23913737/base62-hash-of-a-string

This made me think why saltpack is using base62 and not base64.

I'm trying to learn here about the difference and why it was a better choice for saltpack.

oconnor663 commented 4 years ago

The goal is to guarantee that an encoding won't be corrupted, no matter where on the internet you paste it. The number 62 is 26 + 26 + 10, that is all the lowercase letters, uppercase letters, and digits. Adding a couple more characters to reach that power of 2 is convenient from an encoding perspective, but it requires using some whitespace or punctuation. Unfortunately, for any punctuation character you might choose, there are some common sites that choose to interpret it as formatting and strip it out of text. For example, you can see what GitHub does right here to my tildes, asterisks, underscores,

equals signs,

and dashes.

marcofranssen commented 4 years ago

So you are saying the choice was made to prevent side effect of characters not falling in the [A-Za-z0-9] range.

chindraba-work commented 4 years ago

So you are saying the choice was made to prevent side effect of characters not falling in the [A-Za-z0-9] range.

Anything outside that range is at risk of being mangled by something somewhere on the Internet. A mangled message is the same as no message.