multiformats / multibase

Self identifying base encodings
289 stars 74 forks source link

Consider encoding: Safebase #51

Open DonaldTsang opened 5 years ago

DonaldTsang commented 5 years ago

https://github.com/kstenerud/safe-encoding has safe16, safe32, safe64, safe80 and safe85 for HTML/XML, JSON, URL and POSIX file names.

DonaldTsang commented 5 years ago

https://github.com/tbaumgard/hybrid64 also has a better version of "zbase64"

DonaldTsang commented 5 years ago

@Stebalien @lidel so, what alternate encoding can we bring on to the table to make multibase more diverse?

DonaldTsang commented 5 years ago

I also kept a list of other bases in https://github.com/kstenerud/safe-encoding/issues/3 and https://github.com/kstenerud/safe-encoding/issues/5 and https://github.com/kstenerud/safe-encoding/issues/6

Stebalien commented 5 years ago

We have a very limited namespace here (there are only so many prefix characters we can use) so making multibase "more diverse" isn't really something I'd like to do (we already have several useless encodings I'd like to drop).

However, I do like those encodings. Are there any projects currently using them?

DonaldTsang commented 5 years ago

@Stebalien sadly right now there are no projects reported that are using those encodings.

But for diversity's sake, is it possible to "hack" the namespaces by using two-byte blocks? I would hope that such safe encodings could gradually phase out RFC-like encoding for some use cases.

Stebalien commented 5 years ago

But for diversity's sake, is it possible to "hack" the namespaces by using two-byte blocks?

We've thought about this a bit. We could, e.g., allocate the s prefix for "safe" encodings. I'd be fine with this from a spec standpoint, but implementing it will be a bit of work.

Regardless, I'm not sure how to break the adoption/implementation chicken and egg situation. For us, the order-preserving property really isn't a sufficient motivation to implement this across the board and switch to it unless there's some kind of critical mass.

DonaldTsang commented 5 years ago

@Stebalien the other option would be to instead create a human-oriented translation layer between safebase hash encodings used externally and actual RFC encoded hashes used internally.

Stebalien commented 5 years ago

Not sure I follow.

DonaldTsang commented 5 years ago

@Stebalien external IO and human input would be done in safebase, then there will be a translation layer between the multibase itself and the input to convert it to zbase32 and RFC-compliant base64, such that the user does not need to handle RFC-compliant zbase32 or base64 or other encodings within multibase.

DonaldTsang commented 4 years ago

@lidel sorry, maybe this sound weird but it is a Christmas wish to see this through.

lidel commented 4 years ago

Internally, in IPFS/libp2p/IPLD, we use unwrapped, binary multihash, so base does not matter outside userland. Creating translation layer for text representations sounds like reinvention of CIDv1, which already allows a single multihash to be represented in multiple bases when in text form.

Looking at it from pragmatic perspective, adding safebase support to CID means not only adding it to the list in this repo, but to every library implementing it, such as go-cid, js-cid etc. This is a significant effort and given the fact that safebases are not being used in real world won't happen until there are libraries, community need or desired characteristic that is hard to ignore.

For now, in contexts where safehash provides real value, one can use it "in userland": extract binary multihash from CID (binary_multihash = new CID('QmHash').multihash) and do safehash(binary_multihash).

mikeal commented 4 years ago

Internally, in IPFS/libp2p/IPLD, we use unwrapped, binary multihash, so base does not matter outside userland.

Mostly, yes. dag-json uses base encoded CID’s internally in its format but uses a consistent base encoding and should probably be updated to not support alternative base encodings in order to maintain consistency.