Open gowthamgts opened 6 years ago
ping @Stebalien
I've fixed the recommended changes in code.
(I'd also kind of like to know why you need octal before we go through the process of defining and implementing it)
I was reading the code and found out base8 was missing and I had some time so. 😕
I was reading the code and found out base8 was missing and I had some time so.
Fair enough. That's also why we have base 2... But we'll still need some spec first.
Totally understandable. Thanks for your time.
@Stebalien: Can I implement base2 encoding in reference with RFCs?
Go ahead.
Regarding spec for base 8, I gave this some thought.
For bases that evenly break bytes into characters, pad to full bytes, this is base 2 (8 chars), base 4 (4 chars), and base 16 (2 chars). It makes sense for these to treat as bitstreams and pad out to whole bytes.
But for other power of two bases that don't evenly fit into a byte, use optional padding at the end. Examples are base64 (3 bytes = 4 chars), and base32 (5 bytes = 8 chars).
Base 8 (3 bytes = 8 chars) could fit in this style encoding.
Another option and the one JS currently uses is to convert to a large number and make leading zeroes represent null bytes similar to base 10 and base 58.
I wrote a base-8 codec that works similar to base-32 and base-64 where you give it an alphabet and an optional padding character. https://github.com/filecoin-project/lua-filecoin/blob/master/base-8.lua
For example base8 with '01234567='
as alphabet using same style as base-32 and base-64:
Decentralize everything!!
->
72106254331267164344605543227514510062566312711713506415133463441102=====
hello world
- 7320625543306744035667562330620==
But if I instead use base-x (which is what the JS implementation currently does), it looks closer to the current test vectors, but different leading zeroes:
Decentralize everything!!
-> 71043126154533472162302661513646244031273145344745643206455631620441
hello world
- 764145330661571007355734466144
@creationix I'm fine with either but I'd like to go with what's commonly used in the community. Have you found any other users of base8?
I've not seen any others. I don't know if there is a common encoding for this. Logically, the same style as base-64 and base-32 makes the most sense.
So, the real question is, should we even bother? A viable option is to just drop base8.
Personally I see base-2 and base-8 both as unneeded. Is there any use case where they are the correct solution? Base-16 works everywhere and encodes much shorter and easier than them.
My recommendation is to either drop them to reduce the maintenance overhead for implementations or to go with the logical encoding as I've suggested if we must keep base-8.
Should creating a
base8_test
file would be the best case to increase coverage?