Closed mplanchard closed 1 year ago
Be careful with env vars.. how will those be allocated across different environments?
Is generally ok if these values CAN collide across hosts, as long as that is unlikely. In CUID, I often used multiple sources of host entropy to create fingerprints less likely to collide.
Hmm, I guess whether env vars are appropriate would depend on what the purpose of the fingerprint
portion of the CUID is and when it's intended to vary.
My assumption is that it should be as unique as possible for any given "instance" of a process/thread producing CUIDs. So if I have 10 machines running 10 docker containers, with each container spinning up 2 processes with 2 threads each, I'd expect we'd want 10 10 2 * 2 = 400 unique fingerprints going into the CUIDs, to help ensure that no two instances can ever generate duplicate IDs.
My worry with just including (random number + proc ID + thread ID) + hash_entropy
is that the (random number + proc ID + thread ID)
seems quite likely to overlap eventually given enough systems. The added entropy from the hash function plus the additional entropy in the CUID inputs may be enough to take care of it, but it seems like it'd be safer to try to include something more system-specific. That said, it turns out env vars aren't available in WASM builds anyway, so that rules them out, unless I use them on non-WASM builds and fall back to something else for WASM.
Experimentally, it seems like the random data plus proc and thread IDs will probably generally be sufficient. Can update later if it isn't.
Working on adding
cuid2
to the Rustcuid
port, and trying to figure out how to do the fingerprint.The JS version is a hash of:
global
(in node) orwindow
(in browser)In Rust, we don't have anything like the
global
object in node or thewindow
object in the browser. So far, I've got:That gives different fingerprints for different processes & threads generating CUIDs on the same system, but doesn't guarantee anything across systems.
It looks like the Python port uses the system hostname, but that would reduce portability and prevents compiling the Rust to target-independent WASM.
One option that springs to mind is environment variables: the specific env var keys and values available to the process are likely to vary a fair bit across systems. On docker, this will include the
HOSTNAME
env var, which is generally set to the container ID. This is what I'm defaulting to for the moment, but would be curious to hear your thoughts.We could also just rely on the random number, process ID, thread ID, and the hash entropy.