oculus42 / short-uuid

Translate standard UUIDs into shorter formats and back.
MIT License
450 stars 13 forks source link

Add support for UTF-8 symbols #65

Open ClassTerr opened 2 years ago

ClassTerr commented 2 years ago

Hello and thank you for such great library. I am using it to shorten uuids in URL.

But even with this library using flickr58 alphabet my UUID is 22 characters. It's better, but still not so short. So, I was experimenting with extending alphabet with additional characters. Since modern browsers are supporting emojis in URL I have tried set alphabet to a list of emojis, but had the next error: The provided Alphabet has duplicate characters resulting in unreliable results. The reason of this error is that UTF characters may occupy more than one character in a JS string:

'💚'.length === 2
new Set('💚').size === 1

So at least this condition is checking wrongly.

I am wondering if support of UTF characters will be implemented in future.

oculus42 commented 2 years ago

This is definitely something that would be great to have, especially for making human readable values. Because of the complexity of how emoji split, we would need something like grapheme-splitter to resolve this consistently. That is a large library because it involves a large set of static values.

We could potentially accept alphabets as arrays, which could solve the initial creation, but we would fail to translate back without grapheme-splitter.

Because of the size and potential performance impact, I think we could implement this as a separate export to the library. This would allow developers to choose that support if desired.

balzdur commented 3 months ago

IIUC, this proposal may be what you need. Still, this is not yet fully available. When available, it may solve the problem and use standard API (= no impact on library size)