ulid / javascript

Universally Unique Lexicographically Sortable Identifier
MIT License
3.04k stars 107 forks source link

Binary implementation in readme #11

Closed alizain closed 7 years ago

alizain commented 7 years ago

Hey everyone,

I've added a column in the readme for binary implementations in your libraries. If you have implemented it, please let me know here or submit a PR so I can add the ✓!

@SuperPaintman @savonarola @merongivian @imdario @Lewiscowles1986 @ararslan @RobThree @fvilers @mdipierro @rafaelsales

RobThree commented 7 years ago

What do you define as a "binary implementation"? I assume you mean not only the "string representation" but also (access to) the "(raw) 128 bits"?

In that case: you can add a ✓ for me 😉 (amongst others I have implemented a ToByteArray(), and you can construct a ULID from a byte array).

ararslan commented 7 years ago

What do you define as a "binary implementation"?

Presumably an implementation of what's described in https://github.com/alizain/ulid#binary-layout-and-byte-order

Lewiscowles1986 commented 7 years ago

I actually will try over xmas to set this up, it's very clear but I'm in a bit of a over-subscribed period atm. I also need to ensure that my library is compatible with yours as I'm a tad unsure about the sources of randomness...

alizain commented 7 years ago

I actually will try over xmas to set this up, it's very clear but I'm in a bit of a over-subscribed period atm

Don't worry, I only asked because I wanted to make sure I mention it on the readme. I've been pretty busy as well, and have been unable to give much time to open source work.

I'm a tad unsure about the sources of randomness

What are you using today? What do you want to use in the future?

Lewiscowles1986 commented 7 years ago

Pretty sure if the string is the same, then the binary representation is the same in most major languages... Also why not just make it string compatible? Then nearly everything gets a tick

alizain commented 7 years ago

Hmm, that's not how I understand it. Python, for example, stores strings by default as Unicode utf-8, so it'll be using at least 1 byte per character, for the 26 characters, adds up to 208 bits, more than the 128 specified. Actually, this is the math for any language that stores its strings in ASCII too, since we're not using any special characters.

Also why not just make it string compatible?

I don't understand what you mean

Lewiscowles1986 commented 7 years ago

Also why not just make it string compatible?

I don't understand what you mean

My bad, I read https://github.com/alizain/ulid/blob/master/README.md#specification and assumed by char you were referring to a data-type which is (on all machines I've ever worked on) at least 8 bits in width... I see what you've done there, probably what lead to the confusion you were talking of actual char-strings (in C terms). It makes things a little confusing though...

Could you explain to me how JS is outputting the chars in < 8-bit format (giving 208-bits for JS too)? It's definitely using string concatenation which should lead to 208-bit output even from your reference implementation.

Right now the .NET port, PHP port and Java port seem to be literally outputting the representation mapped to Base32 as strings, using chars native to their language. I'm pretty sure any conversion to alternative encoding should be completed outside of the library, but I'm more worried the math has gone awry somewhere.

At the level we are talking about; inter-operability would need to be done on more than just ULiD strings. If it's stored in a DB as a string the specific encoding conversion would be handled by the client, so again I ask can we not just use strings instead of "binary compatibility"?

alizain commented 7 years ago

Sorry for the overdue reply.

@RobThree ✓ has been added!

@Lewiscowles1986

Could you explain to me how JS is outputting the chars in < 8-bit format (giving 208-bits for JS too)?

It's using 208-bits for the string representation, because the JS implementation isn't working directly with bytes.

# String representation
128 bits of information -> encoded as 26 characters with base32 -> stored as utf8/ascii text on disk, which takes up 208 bits

# Binary representation
128 bits of information -> stored as 128 bits on disk
Lewiscowles1986 commented 7 years ago

@alizain so what is the solution for systems not supporting byte-level manipulation? Also what is the solution to the fact that many systems will use different character encoding for their strings? This seems like a nice idea, but something that really doesn't belong in most HLL's IMO

alizain commented 7 years ago

What systems are you working on don't support byte-level manipulation?

Character encoding for strings is beyond the scope of this project. As you agreeably opined, we are not concerned with how strings are stored. The previous example was for illustration of 99.9% of existing systems.

alizain commented 7 years ago

Also, I don't know of any character encoding scheme that is not ASCII compatible and in-use today. Do you know any?