Insufficient entropy for fingerprint

oberrich commented 4 months ago

I've compared both the original and this implementation and while it's true that there's a standard way of getting a random number, the process id and the thread id, this does not provide the same amount of entropy. Operating systems often use bit-hacks and smart encodings of identifiers for efficient lookup. If you observe the pids/tids on your consumer Windows machine for example you will see that all pids are of the form 2*n where 150 < n < 20'000. Servers have even less range for n as there's less Software running and they are less random and more "lab condition". To make the seed more random this implementation calls rand twice.

When we now view this in the context of distributed systems* it becomes clear that pids and tids aren't really random and the only meaningful source of entropy comes from the two rand calls, which suddenly sounds a lot gloomier - as it should.

I think the amount of entropy here is insufficient compared to the original implementation and we should come up with a better way to seed this.

For reference, the original implementation hashes

> Object.keys(window).toString().length
< 4980

characters and uses 32 rand calls to build the seed.

* imagine an infrastructure like Netflix, where you have 1000 identically configured servers, running the exact same microservice and think what happens when they - at the same time, with the same seed - try to generate an ID and 5% of the servers happen to have the same pid/tid.

mplanchard commented 3 months ago

Yeah this is an interesting point. We don't really have anything like window to use as a base, and it's possible that the reference implementation has updated since I last looked at it.

There was some discussion on this here: https://github.com/paralleldrive/cuid2/issues/27

I'm not seeing where the reference implementation uses 32 rand calls to build the seed, unless you're talking about the original cuid 1. Current version is here: https://github.com/paralleldrive/cuid2/blob/main/src/index.js

Do you have suggestions as to a better approach? Perhaps just a larger random number to include in the fingerprint?

oberrich commented 3 months ago

Could be a good idea to just mix in something like an __rdtsc() on x86 or GetTickCount64() on Windows and combine to a hash. Can just layer those on top of the PID/TID stuff. Another thing that can be taken advantage of possibly is ASLR, maybe take ImageBase or something.

mplanchard commented 3 months ago

I don't have any windows machines so it is hard for me to test that kind of thing.

I'd be happy to accept a PR adding some stuff to the fingerprint using conditional compilation on windows targets though. In the meantime, I might just throw a UUID in there, which is of course kind of a funny thing to do in an alternative "universal identifier" crate, but it would at least be sure that each host has essentially a unique fingerprint

mplanchard commented 3 months ago

Looking back at the implementation, are you sure about the implementation having insufficient entropy?

It is the combination of:

Two random 128-bit numbers
Process ID
Thread ID

The combination of two random 128-bit numbers gives us 256 random bits plus process/thread ID. By comparison a UUID v4 has 122 random bits.

To get overlap, you'd need for two processes with the same process and thread ID to also generate the same two random 128-bit numbers.

The likelihood of generating the same two random 128-bit numbers is 1/(2^128)*1/(2^128) = 1/1e77. If you created a fingerprint every microsecond, it would still take 3e63 years before you'd expect to see a duplicate.

I'm not totally convinced that this is insufficient entropy.

oberrich commented 3 months ago

Im not sure at all how the Rust implementation of random number generation works, if it can be made deterministic by calls to srand or something similar that could be problematic, but the rng calls are probably the saving grace here. I think the likelihood of PIDs and TIDs colliding is real and the range of numbers it produces is not really providing a lot of value. I do think adding some architecture (cpuid, rdtsc) or OS specific entropy (Uptime, Drive IDs) could make more sense. I feel like using PID/TID feels a bit hacky although it’s convenient as abstracted by std crate.

mplanchard commented 3 months ago

According to the docs, the Standard distribution (used by default for rand::gen()) generates numbers uniformly distributed across all possible integer values, "provided the Rng is well behaved."

We're using ThreadRng, which says:

ThreadRng uses the same PRNG as StdRng for security and performance and is automatically seeded from OsRng.

Unlike StdRng, ThreadRng uses the ReseedingRng wrapper to reseed the PRNG from fresh entropy every 64 kiB of random data as well as after a fork on Unix (though not quite immediately; see documentation of ReseedingRng). Note that the reseeding is done as an extra precaution against side-channel attacks and mis-use (e.g. if somehow weak entropy were supplied initially). The PRNG algorithms used are assumed to be secure.

OsRng will be as good as the operating system's provided entropy. As noted above, the ThreadRng also resseds. The docs on the Reseeding Rng say:

Reseeding after a fixed number of generated bytes is never strictly necessary. Cryptographic PRNGs don’t have a limited number of bytes they can output, or at least not a limit reachable in any practical way. There is no such thing as ‘running out of entropy’.

Occasionally reseeding can be seen as some form of ‘security in depth’. Even if in the future a cryptographic weakness is found in the CSPRNG being used, or a flaw in the implementation, occasionally reseeding should make exploiting it much more difficult or even impossible.

It reseeds after generating 64 kb of data or after a fork, which means that we should be doubly good in the case of running in multiple threads.

I think the likelihood of PIDs and TIDs colliding is real and the range of numbers it produces is not really providing a lot of value.

I think this is probably true.

I do think adding some architecture (cpuid, rdtsc) or OS specific entropy (Uptime, Drive IDs) could make more sense.

I agree with this! It gets a little tricky when thinking of dockerized hosts, but I'd be glad to add some system-specific components to the fingerprint. It's a little difficult for me to test thoroughly, because I only have linux machines. Open to PRs in that vein with conditional compilation for specific targets, and I'll look into efficient ways to get some useful info on Linux that isn't totally useless in a dockerized context.

oberrich commented 3 months ago

I agree with this! It gets a little tricky when thinking of dockerized hosts, but I'd be glad to add some system-specific components to the fingerprint. It's a little difficult for me to test thoroughly, because I only have linux machines. Open to PRs in that vein with conditional compilation for specific targets, and I'll look into efficient ways to get some useful info on Linux that isn't totally useless in a dockerized context.

I've implemented some stuff (cpuid via raw-cpuid crate, computer name, volume serials of logical hard drives) for Windows/x8664. Source Code_

For now it just feeds everything into a Hasher because this is really convenient for error handling and adding new stuff that contributes to the hash.

oberrich commented 3 months ago

Added encoding of KSHARED_USER_DATA, this includes information such as OS version (major.minor.build), boot id, active processor count, number of physical pages, system timings and some more stuff. For x86/x86_64 it now encodes the TSC (_rdtsc).

I think this should provide enough entropy for Windows, not really sure about Linux.

mplanchard / cuid-rust

Insufficient entropy for fingerprint #15