jimlowrey / aqua-sql

:bulb: A website and user system starter
https://jimlowrey.github.io/aqua-sql/
MIT License
3 stars 2 forks source link

Switch to CUIDs over UUIDs #2

Open jimlowrey opened 6 years ago

jimlowrey commented 6 years ago

UUIDs were chosen to facilitate client creation of database primary keys. CUIDs can be used in the same way but are much shorter in length. They also have various other benefits described here.

Aesthetically URLs look a little nicer as well.

Anyone have any thoughts here?

xeoncross commented 6 years ago

The actual CUID implementation is found here with the following breakdown.

c - h72gsb32 - 0000 - udoc - l363eofy

The groups, in order, are:

Personally, I would like to see more examples of short comings or where CUID would not be a good fit. I've tried to create my own ID's cramming everything into a 64bit integer (BIGINT) value and it isn't easy. Standard UUID's are 128 bits. I didn't see any mention of how many bits a CUID is.

jimlowrey commented 6 years ago

Thanks for the reply.

First let me retract 'much shorter'. For some reason I was thinking there were 16 characters long and not 25.

This gets out of my comfort zone pretty fast. So hopefully I will learn a few things here.

I've had a cup of coffee (and then another) and read through node-cuid .vs. node-uuid comparison as well as the CUID docs. Here are a few points that stood out to me.

1) CUIDs are monotonically increasing which makes them suited for primary keys. What we currently use, UUID V1, has the time stamp first so they may share this quality as well. This I am not sure about.

2) CUIDs start with a 'c' and don't use dashes so they can be used as web identifiers. HTML ids for example can't start with a number.

3) CUIDs are more reliable than UUID V1 when generated in the browser.

Nearly all the debatable issues of the issues revolve around problems of scale, very high scale, and randomness. After that there is the this little small thing about the CUIDs form factor. That is to say no dashes and starts with a 'c'. It may seem trivial but it's one less thing to have to think about when using the ids from place to place in a web app. In the short term that's going to affect development much more than any scaling issues. In the long term I'm more persuaded by the arguments for CUIDs when it comes to scaling.

OK, so all that said your question about how many bits a CUID is - is a good one. Which, I think, leads to how would it be stored. I don't know. It seems to me that since a CUID is ultimately a javascript string it would have to be stored as something that would get that exact string back. Thus it might be of variable length depending on the characters in the string.

xeoncross commented 6 years ago

UUID v4 can be sorted by time since they do contain the timestamp (just like CUID's). You don't have to represent the UUID as the string with dashes - you could base32/64/58 encode it just like a CUID for display.

Both UUIDs and CUIDs should be stored as binary since they are more than 8 bytes (> native 64 bit integers). Don't store either one as a string unless you have no choice.

I don't see any problems with CUID's other than their use is much less than UUID's causing concerns about compatibility or a lack of foreseen problems using them.

jimlowrey commented 6 years ago

Thank you. That is very helpful.

Regarding size I was thinking since CUIDs are literally defined in javascript as javascript strings then storage-wise they would have to adhere to that. They would have to be stored as the underlying UTF-16 or UCS-2 byte arrays that they are. In the case of UTF-16 this would be 16 or 32 bits. Thus the number bits required for storage could vary.

But looking closer at the CUID code I see that everything that could allow for a 32 bit unicode character to get in there is derived from calling Number.toString(32). Which puts the possible character possibilities down to ones that can be represented with 8 bits.

Noting that a CUID is always 25 characters, assuming I'm thinking correctly, it would take 200 bits to store one.

xeoncross commented 6 years ago

If a CUID is 25 characters then it's not much shorter than a UUID which is only 128 bits and (in base 64) is only 28 characters long.

var crypto = require('crypto');
var s = 'the quick brown fox';
console.log(s);

var sha = crypto.createHash('sha1');
sha.update(s);
console.log(sha.digest('base64'));

var sha = crypto.createHash('sha1');
sha.update(s);
console.log(sha.digest('hex'));

// UUID is 128 bits, same as a SHA1

Output:

the quick brown fox
ztcfpyNSMb7Tg/rP3EHE3cwi7PE=
ced71fa7235231bed383facfdc41c4ddcc22ecf1

I would recommend replacing the standard base64 with base64-url-safe or with base58 like bitcoin has.

xeoncross commented 6 years ago

My mistake, a sha1 is 160 bits. A UUID in base64 would be 22 characters 71jbvv7LfRKYp19gtRLtkn. The CUID is in base32 so it would be smaller in base64 also.