zamicol / BaseConverter

Zamicol's Base Converter - Convert arbitrary bases with arbitrary alphabets.
https://convert.zamicol.com
BSD 3-Clause "New" or "Revised" License
25 stars 4 forks source link

[BUG][CRITICAL] Incorrect calculation results #25

Closed GregTonoski closed 1 month ago

GregTonoski commented 3 months ago

Steps: Input hex: C85AFBACCF3E1EE40BDCD721A9AD134134477FD51840EFC0511E0182AE92F78E

Actual result: Output base32: BSC27OWM6PQ64QF5ZVZBVGWRGQJUI575KGCA57AFCHQBQKXJF54O (There is the bug for other bases converstion too, unfortunately).

Expected result: Output base32 (RFC 4648): ZBNPXLGPHYPOIC6424Q2TLITIE2EO76VDBAO7QCRDYAYFLUS66HA====

Testing environment: Windows 11, Brave web browser.

zamicol commented 1 month ago

It is working as intended. The tool currently hasn't implemented a RFC 4648 base32 mode option. For comparison, see under "extras" RFC 4648 base64 which is mostly implemented.

RFC 4648 is a "bucket" conversion RFC. There's not a formalism describing RFC 4648 like methods, but internally I've dubbed it "bucket conversion".

"Bucket conversion" is not the same as "natural conversion", I also name it "arbitrary conversion", which typically is implemented by using the the "iterative divide by radix" method.

Under "Extras", RFC 4648 is named base64, b64ut, and ub64p. Although this does not cover all the permutations of the (unnamed) RFC 4648 bases, it does cover the most common: RFC base64 URL-safe truncated, and RFC 4648 unsafe base64 padded. There's also a question of ignoring error on "non-strict"/non-canonical encoding, for example the b64ut strings hOk and hOl may decode to the same byte string (Hex 84E9). (See https://github.com/Cyphrme/CozeX/blob/master/implemented/base64.md and https://github.com/Cyphrme/Coze/issues/18)

There is a difference between a base's conversion method and it's alphabet. RFC 4648 base32's alphabet isn't the same as the conversion method, and a certain conversion algorithm may be free to use many alphabet. We consider "base" to be a specific alphabet paired with a specific conversion method (with "natural conversion"/"arbitrary conversion" being the universal default).

RFC 4648 appears to require at lease 8 bases to implement fully. Padding:2 options, alphabet:2 options, and encoding:2 options, although the encoding should only apply to input and not output as output should always be correctly encoded.

If anyone is interested in implementing this, a pull request would be appreciated.