groupon /

BSD 3-Clause "New" or "Revised" License
80 stars 23 forks source link

Locality UUID (vB) Java

This is a UUID class intended to help control data locality when inserting into a distributed data system, such as MongoDB or HBase. There is also a Ruby implementation. This version does not conform to any external standard or spec. Developed at Groupon in Palo Alto by Peter Bakkum and Michael Craig.

Problems we encountered with other UUID solutions:


This library has both Java and Ruby implementations, and is also accessible from the command line.


This generates UUIDs in the following format:


w: counter value
x: process id
b: literal hex 'b' representing the UUID version
y: fragment of machine MAC address
z: UTC timestamp (milliseconds since epoch)



counter     : 3,488,672,514
process id  : 12,618
MAC address : __:__:_d:53:7a:50
timestamp   : 1,350,327,498,450 (Mon, 15 Oct 2012 18:58:18.450 UTC)

Example Use

If the jar is in one of your repositories, add this to your pom.xml:


Use it in a program:

import com.groupon.uuid.UUID;
import java.util.Arrays;

public class Example {
  public static void main(String[] args) {
      UUID generated = new UUID();

      System.out.println("UUID            : " + generated.toString());
      System.out.println("raw bytes       : " + Arrays.toString(generated.getBytes()));
      System.out.println("process id      : " + generated.getProcessId());
      System.out.println("MAC fragment    : " + Arrays.toString(generated.getMacFragment()));
      System.out.println("timestamp       : " + generated.getTimestamp());
      System.out.println("UUID version    : " + generated.getVersion());

      UUID copy = new UUID(generated.toString());
      System.out.println("copied          : " + generated.equals(copy));

Or get the jar and run from the command line:

java -cp locality-uuid-1.1.1.jar com.groupon.uuid.GenerateUUID


This UUID version was designed to have easily readable PID, MAC address, and timestamp values, with a regularly incremented count. The motivations for this implementation are to reduce the chance of duplicate ids, store more useful information in UUIDs, and ensure that the first few characters vary for successively generated ids, which can be important for splitting ids over a cluster. The UUID generator is also designed to be be thread-safe without locking.

Uniqueness is supported by the millisecond precision timestamp, the MAC address of the generating machine, the 2 byte process id, and a 4 byte counter. Thus, a UUID is guaranteed to be unique in an id space if each machine allows 65,536 processes or less, does not share the last 28 bits of its MAC address with another machine in the id space, and generates fewer than 4,294,967,295 ids per millisecond in a process.

Counter The counter value is reversed, such that the least significant 4-bit block is the first character of the UUID. This is useful because it makes the earlier bits of the UUID change more often. Note that the counter is not incremented by 1 each time, but rather by a large prime number, such that its incremental value is significantly different, but it takes many iterations to reach the same value.

Examples of sequentially generated ids in the default counter mode:


Note the high variability of the first few characters.

The counter can also be toggled into sequential mode to effectively reverse this logic. This is useful because it means you can control the locality of your data as you generate ids across a cluster. Sequential mode works by creating an initial value based on a hash of the current date and hour. This means it can be discovered independently on distributed machines. The value is then incremented by one for each id generated. If you use key-based sharding, data inserted with these ids should have some locality.

Examples of sequentially generated ids in sequential counter mode:


PID This value is just the current process id modulo 65,536. In my experience, most linux machines do not allow PID numbers to go this high, but OSX machines do.

MAC Address The last 28 bits of the first active MAC address found on the machine. If no active MAC address is found, this is filled in with zeroes.

Timestamp This is the UTC milliseconds since Unix epoch. To convert to a time manually first copy the last segment of the UUID, convert to decimal, then use a time library to count up from 1970-1-1 0:00:00.000 UTC.


More information is available in the source code comments.


Generate a new UUID object.

UUID(String uuid)

Construct a UUID with the given String, must be of the form xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx where x matches [0-9a-fA-F].

UUID(byte[] uuid)

Construct a UUID given its raw byte array contents.

UUID(java.util.UUID UUID)

Construct a locality UUID object given a java.util.UUID object.

UUID(long mostSignificantBits, long leastSignificantBits)

Construct a UUID given longs representing the most and least significant bits of a UUID.

static boolean isValidUUID(String id)

Check if a String is in the valid UUID format such that it can be parsed.

static boolean isValidUUID(char[] id)

Check if a character array is in the valid UUID format such that it can be parsed.

static void useSequentialIds()

Toggle into sequential mode, so ids are generated in order.

static void useVariableIds()

Toggle into variable mode, so the first few characters of each id vary during generation. This is the default mode.

byte[] getBytes()

Get raw byte content of UUID.

String toString()

Get UUID String in the standard format.

java.util.UUID toJavaUUID()

Get this com.groupon.uuid.UUID object as a java.util.UUID object.

long getMostSignificantBits()

Get the first half of this UUID as a long value.

long getLeastSignificantBits()

Get the second half of this UUID as a long value.

char getVersion()

Return the UUID version character, which is 'b' for ids generated by this library.

int getProcessId()

Return process id embedded in UUID.

Date getTimestamp()

Return timestamp embedded in UUID, which is set at generation.

byte[] getMacFragment()

Get the embedded MAC Address fragment. This will be 6 bytes long, with the first two and a half bytes set to 0.