Added Base8 Implementation

multiformats / go-multibase

Implementation of multibase parser in go

MIT License

33 stars 17 forks source link

Added Base8 Implementation #26

Open gowthamgts opened 6 years ago

gowthamgts commented 6 years ago

Should creating a base8_test file would be the best case to increase coverage?

gowthamgts commented 5 years ago

ping @Stebalien

gowthamgts commented 5 years ago

I've fixed the recommended changes in code.

(I'd also kind of like to know why you need octal before we go through the process of defining and implementing it)

I was reading the code and found out base8 was missing and I had some time so. 😕

Stebalien commented 5 years ago

I was reading the code and found out base8 was missing and I had some time so.

Fair enough. That's also why we have base 2... But we'll still need some spec first.

gowthamgts commented 5 years ago

Totally understandable. Thanks for your time.

gowthamgts commented 5 years ago

@Stebalien: Can I implement base2 encoding in reference with RFCs?

Stebalien commented 5 years ago

Go ahead.

creationix commented 5 years ago

Regarding spec for base 8, I gave this some thought.

For bases that evenly break bytes into characters, pad to full bytes, this is base 2 (8 chars), base 4 (4 chars), and base 16 (2 chars). It makes sense for these to treat as bitstreams and pad out to whole bytes.

But for other power of two bases that don't evenly fit into a byte, use optional padding at the end. Examples are base64 (3 bytes = 4 chars), and base32 (5 bytes = 8 chars).

Base 8 (3 bytes = 8 chars) could fit in this style encoding.

Another option and the one JS currently uses is to convert to a large number and make leading zeroes represent null bytes similar to base 10 and base 58.

creationix commented 5 years ago

I wrote a base-8 codec that works similar to base-32 and base-64 where you give it an alphabet and an optional padding character. https://github.com/filecoin-project/lua-filecoin/blob/master/base-8.lua

For example base8 with '01234567=' as alphabet using same style as base-32 and base-64:

Decentralize everything!! -> 72106254331267164344605543227514510062566312711713506415133463441102=====
hello world - 7320625543306744035667562330620==

But if I instead use base-x (which is what the JS implementation currently does), it looks closer to the current test vectors, but different leading zeroes:

Decentralize everything!! -> 71043126154533472162302661513646244031273145344745643206455631620441
hello world - 764145330661571007355734466144

Stebalien commented 5 years ago

@creationix I'm fine with either but I'd like to go with what's commonly used in the community. Have you found any other users of base8?

creationix commented 5 years ago

I've not seen any others. I don't know if there is a common encoding for this. Logically, the same style as base-64 and base-32 makes the most sense.

Stebalien commented 5 years ago

So, the real question is, should we even bother? A viable option is to just drop base8.

creationix commented 5 years ago

Personally I see base-2 and base-8 both as unneeded. Is there any use case where they are the correct solution? Base-16 works everywhere and encodes much shorter and easier than them.

My recommendation is to either drop them to reduce the maintenance overhead for implementations or to go with the logical encoding as I've suggested if we must keep base-8.

Stebalien commented 5 years ago

https://github.com/multiformats/multibase/issues/59