tc39 / proposal-uuid

UUID proposal for ECMAScript (Stage 1)
463 stars 7 forks source link

ECMAScript proposal: JavaScript standard library UUID

⚠️⚠️ UPDATE 2021: This proposal is now being pursued at: https://github.com/WICG/uuid ⚠️⚠️

Status: Stage 1

Authors

Synopsis

The JavaScript standard library UUID describes an API for generating character encoded Universally Unique Identifiers (UUID) based on IETF RFC 4122, available for import in JavaScript engines.

Motivation

UUID generation is an extremely common software requirement

The uuid module on npm currently receives some 64,000,000 monthly downloads and is relied on by over 2,600,000 repositories (as of June 2019).

The ubiquitous nature of the uuid module demonstrates that UUID generation is a common requirement for JavaScript software applications, making the functionality a good candidate for the standard library.

Developers "re-inventing the wheel" is potentially harmful

Developers who have not been exposed to RFC 4122 might naturally opt to invent their own approaches to UUID generation, potentially using Math.random() (in TIFU by using Math.random() there's an in-depth discussion of why a Cryptographically-Secure-Pseudo-Random-Number-Generator (CSPRNG) should be used when generating UUIDs).

Introducing a UUID standard library, which dictates that a CSPRNG must be used, helps protect developers from security pitfalls.

Overview

UUID API

The UUID standard library provides an API for generating RFC 4122 identifiers.

The only export of the UUID library that is initially supported is randomUUID(), a method which implements the version 4 "Algorithm for Creating a UUID from Truly Random or Pseudo-Random Numbers", and returns the string representation (as described in RFC-4122).

// We're not yet certain as to how the API will be accessed (whether it's in the global, or a
// future built-in module), and this will be part of the investigative process as we continue
// working on the proposal.
randomUUID(); // "52e6953d-edbe-4953-be2e-65ed3836b2f0"

Math.getRandomValues()

Math.getRandomValues() exposes an identical API to the W3C crypto.getRandomValues() recommendation. With the same guarantees, regarding the quality of randomness:

Implementations should generate cryptographically random values using well-established cryptographic pseudo-random number generators seeded with high-quality entropy, such as from an operating-system entropy source (e.g., "/dev/urandom"). This specification provides no lower-bound on the information theoretic entropy present in cryptographically random values, but implementations should make a best effort to provide as much entropy as practicable.

Math.getRandomValues() will act as the foundation for implementing UUID algorithms, providing a single mockable (see #25) source of randomness.

Out of scope

Algorithms described in RFC 4122 other than version 4 are not initially supported.

Statistics we've collected (see analysis/README.md) indicate that the version 4 algorithm is most widely used:

Algorithm Version Repo Count % Weighted by Watch Count %
v4 4315 77.0% 149802 89.5%
v1 1228 21.9% 16219 9.7%
v5 51 0.9% 1290 0.8%
v3 11 0.2% 116 0.1%

Regarding other UUID versions

While there is utility in other UUID versions, we are advocating starting with a minimal API surface that supports a large percentage of users (the string representation of version 4 UUIDs).

If research and/or user feedback later indicates that additional functionality, such as versions 1, 3, and 5 UUIDs, would add value, this proposal does not preclude these additions.

Use cases

How do folks in the community use RFC 4122 UUIDs?

Creating unique keys for database entries

Generating fake testing data

Writing to temporary files

FAQ

What are the advantages to uuid being in the standard library?

How unique are v4 UUIDs?

If you ignore the challenges involved in random number generation, then v4 UUIDs are unique enough for all but the most stringent use cases. For example, the odds of a collision among 3.3 quadrillion version 4 UUIDs (equivalent to generating a million UUIDs/second for 104 years) is roughly one in a million (p = 0.000001). Source.

That said, the quality of the random number generator is vital to uniqueness. Flawed RNG implementations have led to UUID collisions in real-world systems. It is for this reason that this spec mandates that any random numbers used come from a "cryptographically secure" source, thereby (hopefully) avoiding such issues.

Why call the export randomUUID() and not something like uuidV4()?

As pointed out in the disucssion v4 UUIDs have the maximum amount of entropy possible for a valid UUID as defined in IETF RFC 4122.

UUIDs defined in IETF RFC 4122 are 128 bit numbers that follow a specific byte layout. All of them contain a "version" field comprising 4 bits and a "variant" field comprising 2 bits, meaning that 6 out of 128 bits are reserved for meta information.

Since v4 UUIDs are defined to have all remaining 122 bits set to random values, there cannot be another UUID version that would contain more randomness.

While any name involving v4 requires a rather deep understanding of the intricate meaning of the term "version" in the context of the UUID spec, the term randomUUID() appears to be much more descriptive for v4 UUIDs.

Aren't v1 UUIDs better because they are guaranteed to be unique?

As an oversimplification, v1 UUIDs consist of two parts: A high-precision timestamp and a node id. IETF RFC 4122 contains several requirements that are supposed to ensure that the resulting v1 UUIDs are unique.

So in practice, modern implementations will generate a random 48 bit node value each time a process is started leaving a probability of 1 in 248 for collisions in the node part. In the unlikely event of such a collision it would take only 75 milliseconds for a duplicate v1 UUID to appear when generating UUIDs at a rate of 1M/second. So while also unlikely, just like with v4 UUIDs there is no practical guarantee that v1 UUIDs are unique.

Are there privacy concerns related to v1 UUIDs?

If implementations follow the primary recommendations of RFC 4122 then v1 UUIDs would indeed leak the hardware MAC address of the machine where they are being created. As discussed above this would most likely not be the case in modern JavaScript implementations where hardware MAC addresses are either unavailable (browser, serverless functions) or not necessarily unique (containers). However, there are rumors that the presence of the MAC address lead to the arrest of the authors of the Melissa Virus and according to the manual even MySQL 8.0 still uses the hardware MAC address on some operating systems.

In any case the exact creation time of any v1 UUID will be contained within the UUID. This alone can be a privacy or data protection concern for many use cases (e.g. leaking the creation timestamp of a user account) so it's yet another reason to be very careful when choosing to use v1 UUIDs.

How do other languages/libraries deal with UUIDs?

Some other languages/libraries use the term "random" to describe version 4 UUIDs as well (go, [Java](https://docs.oracle.com/javase/10/docs/api/java/util/UUID.html#randomUUID()), C++ Boost).

Apart from that, UUID adoption across other languages/libraries seems to be rather inconsistent:

TODO

References