pid / puid

Generate an unique ID depending on time, machine and process for use in a distributed environment.
http://pid.github.io/puid/
MIT License
241 stars 8 forks source link

Can you explain the difference between PUID and UUID? #4

Closed migurski closed 11 years ago

migurski commented 11 years ago

I’m curious why create a new kind of identifier that seemingly solves the same problem as UUID. Is it because of Javascript library availability?

jdarling commented 11 years ago

If you look at what a UUID (spec 1-5) is "A UUID is a 16-octet (128-bit) number. In its canonical form, a UUID is represented by 32 hexadecimal digits, displayed in five groups separated by hyphens, in the form 8-4-4-4-12 for a total of 36 characters (32 alphanumeric characters and four hyphens)."

Thus a UUID is a 32 digit hex number.

The PUID that is put forth here uses something more akin to how MongoDB does things using all 26 characters and 10 digits to generate a shorter (length wise) lexicon that still maintains entropy that is high enough to be considered usable with no collision even under a high use.

Oh, and I don't know the real reason, but that is my guess.

pid commented 11 years ago

Thanks @jdarling, nice explanation :-) puid is historically grown ;-) First I wanted a shorter uid, that was the reason I didn't choose UUID (my usecase, Redis keys, shorter is better...if you have to add prefix to key a.s.o.), the long-puid is a little bit spooky about machineId, but it works for my usecase... once the job was done, I was thinking about "can I make it shorter", I made puid-short12/14, it's faster than generating UUID (particular for i.e. bulk-imports to Redis)... puid-short12 works safe if you use it on a single host in various instances. I use the process.hrtime()[1] value, which is a counter in nanoseconds, after my research/tests it is safe on a single host... and in the next step I added the nodeId to puid-short12 -> puid-short14, there are 1296 (including nodeId '00', otherwise 1295) nodes possible, like snowflake from Twitter, you have to handle the nodeIds by yourself or something like zookeeper/apache or something simliar.

summarized, if the length doesn't matter, use UUID, it's an language independent implementation, puid works only for nodejs

I will think about, if it is useful to convert an UUID to a base36 value... but actually I have the job done, it works for me and everything is fine at the moment, at least for me ;-)

I hope that answers your questions,

migurski commented 11 years ago

It does, thank you!

matthiasg commented 11 years ago

since node is inherently single threaded i wonder why you didnt create a real counter instead of a 'nano-second-based' counter. there wont be any competing threads in the same process so the counter is i.e. a singleton and could just overflow (secured by milliseconds or nanoseconds).

I understand that you write it should be safe, but you dont have a safety net when it isnt, such as remembering the last id you passed out (it comes out to mostly the same but with no logic for a counter. 1 read/1 compare/ 1 write vs 1 read/1 inc/1 write)

any thoughts on that ? btw i agree nano seconds SHOULD be safe, but you dont explain the likelihood or parameters for making an informed decision.

pid commented 11 years ago

hi Matthias, thanks for feedback:)

you mentioned the point, "i.e. singleton".... first to say, a require() in node doesn't ensure that you get the same object, so you have to pass around the object inside the app... that was what I want to avoid... I wanted a unique ID each time... you require() and create an instance and generate an id.

next to the nanoseconds ;-) I use the process.hrtime()[1], that's the reason, that puid only works with node and isn't applicable to other language,.... the hrtime'er is a continuously counting one (independently from the process on the host and the current time) ... and I maintain that's not possible in the same process ("since node is inherently single threaded") to generate two puid in the same nanosecond, the roundtrip to execute the generate() method requires more than a nanosecond... I mentioned it in the readme, that I tested a lot of parallel puid generation processes (inside one app and parallel started apps, that only generate puids in a loop, generating billions of ids).

So there isn't any scientific proof that this is safe, but the nature of the hrtime'er give me the certitude that it is safe ;-)

I think that it would be almost safe to use this in distributed environments, but this is a supposition, so you should use the node-id with puid-short, with puid-long you don't have to think about it, because the node-id will extract from the first public network interface or fallback to hostname.

Actually I use puid-short (with node-id) for heavy inserts to redis, from multi hosts to the redis server, without any conflicts for a while.

I can not repeat it often enough, if you don't have the requirement to have a short as possible unique id, use UUID. I build it with a requirement to use it with redis, it should be short, because I have to add prefixes to the keys additionally. In the sum, in my case, there is heavy network traffic, queries/results to transfer to the wire a.s.o. and I wanted to reduce the amount to a minimum. That's the reason I build puid for my own ;-)

Your decision should be UUID, except you need it short as possible and comprehend the characteristic of process.hrtime ..and you believe that it's unique ;-)

thanks