ramsey / uuid

:snowflake: A PHP library for generating universally unique identifiers (UUIDs).
https://uuid.ramsey.dev
MIT License
12.47k stars 501 forks source link

Uuid:uuid4() collisions #80

Closed giorgiosironi closed 8 years ago

giorgiosironi commented 9 years ago

We are generating about 1M UUID4 a day, and we are getting several hundred collisions a day, such as:

[2015-08-26 21:29:19 +0200] [production-onebipws-1.apache - 17819] [DEBUG] [Request] 
time_total=0.145
"request_id":"fdfb98c1-4367-4f22-b68a-d7cdaedcc069" 

[2015-08-27 00:36:16 +0200] [production-onebipws-1.apache - 17819] [DEBUG] [Request] 
time_total=0.016
"request_id":"fdfb98c1-4367-4f22-b68a-d7cdaedcc069"

The issue seem to be correlated with the same Apache process regenerating the same UUID after several hours. It also seem to be correlated with particular EC2 machines which presents the problem.

We checked to have openssl_random_pseudo_bytes and if it was using a strong algorithm:

root@dev-all-onebip:~/projects/.../onebip-ultimate$  (master) # php -r "var_dump(function_exists('openssl_random_pseudo_bytes'));"                                      
bool(true)
root@dev-all-onebip:~/projects/.../onebip-ultimate$  (master) # php -r 'openssl_random_pseudo_bytes(16, $strong); var_dump($strong);'
bool(true)

How can we debug this problem?

ircmaxell commented 9 years ago

Which version of UUID are you using? Can you try switching to master?

giorgiosironi commented 9 years ago

We are on 2.8.1, but from what I see the code for uuid4() is identical to master:

public static function uuid4()
{
    $bytes = self::generateBytes(16);
    // When converting the bytes to hex, it turns into a 32-character
    // hexadecimal string that looks a lot like an MD5 hash, so at this
    // point, we can just pass it to uuidFromHashedName. 
    $hex = bin2hex($bytes);
    return self::uuidFromHashedName($hex, 4);
}

private static function generateBytes($length)
{
    if (self::hasOpensslRandomPseudoBytes()) {
        return openssl_random_pseudo_bytes($length);
    }
    ...
}

Here is a sample of the duplicated UUIDs: https://gist.github.com/giorgiosironi/f1ce4682868ca6a6279d

ramsey commented 9 years ago

I wonder if you'll experience the same if you switch to using @ircmaxell's RandomLib.

Using master or 3.0.0-alpha3, you can do so like this:

$uuidFactory = new \Ramsey\Uuid\UuidFactory();
$uuidFactory->setRandomGenerator(new \Ramsey\Uuid\Generator\RandomLibAdapter());
\Ramsey\Uuid\Uuid::setFactory($uuidFactory);

$uuid = \Ramsey\Uuid\Uuid::uuid4();
giorgiosironi commented 9 years ago

Currently I am trying to reproduce the problem with a cli script to avoid impacting the production system, if I have a test that produces collisions I will try rerun it with RandomLib.

ramsey commented 9 years ago

Can you share your test that produces collisions? Does it consistently reproduce them, and is it always reproducing them for the same UUIDs?

fabre-thibaud commented 9 years ago

Yup, would be interesting to have more information on your setup (OS version, openssl version, machine architecture (x86 vs x86_64) ... the more the better)

I'm unable to get a single collision on a 12M set:

<?php

require_once 'vendor/autoload.php';

while (1) {
    file_put_contents('uuids.txt', \Rhumsaa\Uuid\Uuid::uuid4() . PHP_EOL, FILE_APPEND);
}
thibaud@thibaud-zbox:~/Workspaces/uuid$ wc -l uuids.txt 
12143446 uuids.txt
thibaud@thibaud-zbox:~/Workspaces/uuid$ sort uuids.txt | uniq -d
thibaud@thibaud-zbox:~/Workspaces/uuid$

EDIT This is with OpenSSL enabled

ircmaxell commented 9 years ago

Try generating them on multiple servers. Part of mt_rand()'s output is based on server timestamp, so collisions are probable there (assuming openssl disabled).

giorgiosironi commented 9 years ago

I do not have yet a script reproducing the problem (it's production traffic presenting the issue), but it happens on a single server, also as stated in the first comment we have the openssl extension enabled. Versions:

root@production-onebipws-1:~$  # openssl version
OpenSSL 1.0.1 14 Mar 2012
root@production-onebipws-1:~$  # php -v
PHP 5.4.43-1+deb.sury.org~precise+1 (cli) (built: Jul 15 2015 12:05:17) 
Copyright (c) 1997-2014 The PHP Group
Zend Engine v2.4.0, Copyright (c) 1998-2014 Zend Technologies
    with Zend OPcache v7.0.2, Copyright (c) 1999-2013, by Zend Technologies
ircmaxell commented 9 years ago

That's a very serious problem then. Could you check to see if the $strong parameter of openssl_random_pseudo_bytes() is showing true or false on this server?

fabre-thibaud commented 9 years ago

@ircmaxell He mentionned in the first post that it returns true :)

@giorgiosironi Does it happen on any other server or only that one ? Is there a software (OS, PHP, OpenSSL) version difference between that specific server and the others that do not collide ?

ircmaxell commented 9 years ago

He may be getting true from the command line. That doesn't mean it can't return different values during load, or when these collisions are generated... On Aug 27, 2015 12:10 PM, "Thibaud Fabre" notifications@github.com wrote:

@ircmaxell https://github.com/ircmaxell He mentionned in the first post that it returns true :)

@giorgiosironi https://github.com/giorgiosironi Does it happen on any other server or only that one ? Is there a software (OS, PHP, OpenSSL) version difference between that specific server and the others that do not collide ?

— Reply to this email directly or view it on GitHub https://github.com/ramsey/uuid/issues/80#issuecomment-135480776.

ramsey commented 9 years ago

Any more word on this? @renan reported similar findings on Twitter:

https://twitter.com/renan_saddam/status/637024558038544388

giorgiosironi commented 9 years ago

What we have done until now:

fabre-thibaud commented 9 years ago

@giorgiosironi Did you manage to find out if openssl_random_pseudo_bytes was storing a false flag in the $strong argument when you get collisions ?

giorgiosironi commented 9 years ago

No, because that would require forking the library and/or patching it in production which has an high development cost

ramsey commented 9 years ago

Can you put a single PHP script on one of the production machines and run it to see?

ramsey commented 9 years ago

Or maybe I misunderstood the ask. I see @aztech-dev was asking if you could see if $strong is false only when you see a collision. Sorry for the confusion.

giorgiosironi commented 9 years ago

About the single PHP script, I did run the same code shown in https://github.com/ramsey/uuid/issues/80#issue-103529643 in the affected servers and it gave the same result of the function being present and $strong being true.

ramsey commented 9 years ago

@giorgiosironi I see your PHP and OpenSSL versions above. What EC2 instance type are you using, since you mentioned that you see this happening on a specific EC2 instance type?

renan commented 9 years ago

I am logging the collisions to see how frequent they are, if any.

In the meantime I have executed the test script @aztech-dev provided in few machines:

The server of which gave me collisions is not around anymore, but was running Ubuntu 12.04.4, PHP 5.3.x and don't know the OpenSSL version. But was all from Ubuntu LTS versions.

ramsey commented 9 years ago

I don't have any data to base this on, but off the cuff, it sounds like the underlying system has a lack of randomness on it. Maybe?

ramsey commented 9 years ago

Just a thought: can you set up monitoring such that, when a collision occurs, you get a report on the read-out of the current value in /proc/sys/kernel/random/entropy_avail?

If the number is > 200, then your entropy level is good. If it's < 200, then that's an indication that there's a problem.

langemeijer commented 9 years ago

I've been watching this thread, because it scared the shit out of me, but I want to share some of my initial thoughts here.

@giorgiosironi shared this piece of code:

private static function generateBytes($length)
{
    if (self::hasOpensslRandomPseudoBytes()) {
        return openssl_random_pseudo_bytes($length);
    }
    ...
}

For me the scary part is the dots. What's there? mt_rand to generate uuids?

In the first post in this issue he shared a piece of shell output

root@dev-all-onebip:~/projects/.../onebip-ultimate$  (master) # php -r "var_dump(function_exists('openssl_random_pseudo_bytes'));"                                      
bool(true)
root@dev-all-onebip:~/projects/.../onebip-ultimate$  (master) # php -r 'openssl_random_pseudo_bytes(16, $strong); var_dump($strong);'
bool(true)

But this doesn't prove that openssl is actually used in the generateBytes() method. I would have liked to see a phpinfo() output from a web request where we can confim that the openssl extension was actually loaded in the webserver.

Modern distro's have split the php.ini and conf.d for the cli and different sapi's. My guess is that openssl was loaded in cli, but not in webserver sapi.

ramsey commented 9 years ago

@langemeijer Here's that equivalent block of code in 3.0.0: https://github.com/ramsey/uuid/blob/3.0.0/src/Generator/RandomGeneratorFactory.php#L58-L74

Your comment makes me think it might be a good idea to "tag" a Uuid object at its point of creation with details about what generator, provider, etc. was used to create it. I'm not sure what this would look like in practice, and you may certainly create a Uuid from a string, bytes, or fields you pass in to it, so that information won't be available in that context, but it might be good to provide this information (when creating a Uuid) for debugging purposes.

matteosister commented 9 years ago

I confirm that we also have collissions on our uuids. We have a table with 2,5M rows. I grouped by unique identifier, and counted the occurrence, and I have many collisions. In one case I have 5 equal uuids.... :fearful:

We are on ec2 too.

ramsey commented 9 years ago

@matteosister Are you able to set up your environment so that you can capture information at the point a collision occurs? Specifically, what is the value of /proc/sys/kernel/random/entropy_avail when the collision occurs? Also, can you give specifics about the EC2 instance you're using (AWS instance type, uname -a, php --version, cat /etc/issue, etc.)?

ramsey commented 9 years ago

@matteosister Also, an example of the code you're using to generate the UUIDs, too, please.

matteosister commented 9 years ago

@ramsey we are trying to isolate the problem....and we have a suspect that something could be related to an edge case in our own code. I will report back when I'm sure. Thanks!

Lansoweb commented 8 years ago

@ramsey I'm running a test on 3 AWS instances (micro, small and medium), one CentOS and 2 AmazonLinux). Each one already has more than 2M (the micro has 12.134.008) without duplicates. Will keep running for a while and report back later. I'm also saving the entropy_avail with each uuid, to if i got a hit, will report the entropy as well.

Lansoweb commented 8 years ago

@ramsey Results:

$ sort uuids.txt | uniq -d
$ wc -l uuids.txt
69589819 uuids.txt

Entropies around 3000

$ sort uuids.txt | uniq -d
$ wc -l uuids.txt
10063312 uuids.txt

Entropies around 3000

# sort uuids.txt | uniq -d
# wc -l uuids.txt
10181234 uuids.txt

Entropies around 200 (too low, maybe because kernel 2.6?), but still no dups.

No duplicates on any of them. Using the following php:

<?php
require_once 'vendor/autoload.php';
while (1) {
    $uuid = \Ramsey\Uuid\Uuid::uuid4();
    $entropy = trim(file_get_contents('/proc/sys/kernel/random/entropy_avail'));

    file_put_contents('uuids.txt', $uuid . PHP_EOL, FILE_APPEND);
    file_put_contents('uuids-entropies.txt', $uuid . ' | ' . $entropy . PHP_EOL, FILE_APPEND);
}

Using composer:

composer require ramsey/uuid ircmaxell/random-lib
ramsey commented 8 years ago

Thanks for doing that, @lansoweb.

I wonder if there's a way to reduce the entropy on a system to see if we can create collisions. That would be an interesting test to try to reproduce what's being reported. It would also help show it might be the system that needs modification (i.e. increasing entropy).

Lansoweb commented 8 years ago

@ramsey Just tried but i'm unable to change the entropy poolsize, maybe because it's a virtualized environment and some kernel features are locked.

Just checked, the poolsize on the CentOS is 4096, but it's a production system and maybe the pool was heavily used, so the low entropy_avail.

langemeijer commented 8 years ago

What I have understood from the people that have taught me is that reduced entropy is unlikely to be a cause for collisions. Good pseudo random number generators generate sound statistically random numbers. Entropy, the secret sauce that is used as the (continous) input for a random generator reduces the predictability of the output, but doesn't affect the statistical randomness.

ramsey commented 8 years ago

What would affect the statistical randomness? I don't think it's anything within the ramsey/uuid library, since we're relying on external genrators to generate the bytes

Lansoweb commented 8 years ago

@ramsey True. My tests were about validating that AWS instances behave as any other linux and this issue is not related to them specifically.

ircmaxell commented 8 years ago

A flaw in the algorithm or a collision in the underlying hash function (Linux kernel uses sha1). Note that openssl on anything but windows doesn't use the kernels RNG, so it's using its own mixer.

Entropy would be a red herring.

Mixers work like this

                           [ Entropy ]
                                  ↓
[ State ]↔↔↔↔↔[ Mixer ]
     ↓                           ↑
[ Hash Function ]       ↑
     ↓                            ↑
[ Splitter ]  →  →  →  ↑
     ↓
$output

Or in a "class form pseudocode":

class CSPRNG {
    private $state;
    public function addEntropy(string $entropy) {
        $this->mix($entropy);
    }
    public function getBytes(): string {
        $hash = $this->hash($this->state);
        list ($return, $newState) = str_split($hash, strlen($hash)/2);
        $this->mix($newState);
        return $return;
    }
    private function mix(string $data) {
        $this->state = $this->hash($data . $this->state);
    }
    private function hash(string $data) : string {
        return hash('sha1', $data, true);
    }
}

So the only real source of "predictability" or collisions would come from the sha1 function being flawed (which it is, but not for this usage).

wjzijderveld commented 8 years ago

We are seeing collisions as well now. On a very small dataset (we generate < 300 a day :-/), so I would think it's something with the underlaying system. 2.6.32-504.16.2.el6.x86_64 and 2.6.32-504.8.1.el6.x86_64 both CentOS 6.7

Generating on the CLI doesn't cause collisions, a while back a tried to generate thru nginx, but no collisions there. It seems that collisions don't happen for uuid's generated on the same day.

Our hosting has installed haveged on 1 machine, but that didn't seemed to have solved the problem, which makes sense if I understand your comment @ircmaxell? As openssl use their own system? We'll ask them if they can upgrade openssl, which is currently 1.0.1e-fips.

At this moment I don't think there is an issue in ramsey/uuid, but I still wanted to let you know :-)

ramsey commented 8 years ago

This thread is scaring people from using this library, so I'd like to get to the bottom of what's causing these duplicates, even if it's not the fault of this library.

For those who are able to reproduce this, can you please dump the name of the generator that's being used to create the UUIDs that are duplicates? This might give us a good idea of where to look deeper into this issue.

Here's a modified version of @Lansoweb's script that will dump the generator used to a file:

<?php
require_once 'vendor/autoload.php';
while (1) {
    $uuid = \Ramsey\Uuid\Uuid::uuid4();
    $entropy = trim(file_get_contents('/proc/sys/kernel/random/entropy_avail'));
    $generator = get_class($uuid->getFactory()->getRandomGenerator());

    file_put_contents('uuids.txt', $uuid . PHP_EOL, FILE_APPEND);
    file_put_contents('uuids-entropies.txt', $uuid . ' | ' . $entropy . PHP_EOL, FILE_APPEND);
    file_put_contents('uuids-generators.txt', $uuid . ' | ' . $generator . PHP_EOL, FILE_APPEND);
}
clphillips commented 8 years ago

I'm also seeing duplicate with UUID version 5 (UPDATE: same applies to version 3 as well). This is on Windows:

$uuid = Uuid::uuid5(Uuid::NAMESPACE_DNS, 'domain.com');
echo $uuid->toString() . "\n";
echo get_class($uuid->getFactory()->getRandomGenerator()) . "\n";

Output:

975a4bf0-a690-5ea9-8469-b07f8978e4e3
Ramsey\Uuid\Generator\OpenSslGenerator
975a4bf0-a690-5ea9-8469-b07f8978e4e3
Ramsey\Uuid\Generator\OpenSslGenerator
975a4bf0-a690-5ea9-8469-b07f8978e4e3
Ramsey\Uuid\Generator\OpenSslGenerator
975a4bf0-a690-5ea9-8469-b07f8978e4e3
Ramsey\Uuid\Generator\OpenSslGenerator

I'll note that UUID4, however, does not produce duplicates:

echo Uuid::uuid4()->toString() . "\n";
echo Uuid::uuid4()->toString() . "\n";
echo Uuid::uuid4()->toString() . "\n";
echo Uuid::uuid4()->toString() . "\n";
13eaa288-97e0-4446-8dba-31d883c7318e
6a929065-ecc9-496a-aef8-f16b52a5b480
2eb338c0-eaf6-46d3-b659-20fc04c5f7ef
53f0991c-aea5-4de6-9f10-1663d7f0cd27
jyggen commented 8 years ago

@clphillips The point of version 3 and 5 is to always produce the same UUID for a namespace/name pair, so always getting 975a4bf0-a690-5ea9-8469-b07f8978e4e3 for domain.com using the DNS namespace is the expected behaviour. It's basically just a md5/sha1 hash turned into an UUID.

ramsey commented 8 years ago

@clphillips As @jyggen stated, what you are seeing is the expected behavior for version 3 and 5 UUIDs. To further clarify, you are seeing Ramsey\Uuid\Generator\OpenSslGenerator each time with the output of get_class($uuid->getFactory()->getRandomGenerator()) because that is the random generator you would be using if you were generating random UUIDs (version 4), but versions 3 and 5 do not use the random generator.

clphillips commented 8 years ago

Thanks @jyggen @ramsey. After I saw the implementation of UUID 3 and 5 I went back to the RFC and realized I probably should have read that before I posted. ;)

ramsey commented 8 years ago

@giorgiosironi, @renan, @matteosister, @wjzijderveld: Any updates on the collisions you are seeing? Have any of you been able to dig into this more to see if you can isolate the problem? Any thoughts on what I can or should do in this library to minimize chances of duplicates?

wjzijderveld commented 8 years ago

Not yet on my side, although I expect to get some time in the coming 2 weeks to dig deeper into it. Will report back as soon as I have more information.

ramsey commented 8 years ago

Some of the answers on this Stack Overflow question provide good food for thought. What I'm interested in is: which random generator is being used when the collisions occur?

giorgiosironi commented 8 years ago

Sorry, no news, we implemented a check to regenerate the UUID in case of collision.

tom-- commented 8 years ago

@ramsey When debugging the revised Yii 2 random getter on someone else's system I produced an info-dump test script that shows all the values used in the branching conditions in the getter. When a random getter has enough branching complexity that it's hard to unit test then your users end up testing some branches for you. They are hard pressed to investigate, even if they notice. So it's useful to have either such a info-dump test script or an instrumented version of the getter.

Also, until PHP 5.6.10 openssl_random_pseudo_bytes() was using the unsafe OpenSSL RNG and lying about it.

ramsey commented 8 years ago

@tom--, this is great information. I wonder if all the systems where we're seeing collisions are using the openssl_random_pseudo_bytes() generator and whether it is the culprit.

Taking a look at Stas's commit to fix the issue reported in PHP bug 70014, it appears that the fix is in versions:

Can anyone who is consistently seeing collisions upgrade to a newer version of PHP and verify whether collisions are still occurring?

Ragazzo commented 8 years ago

@ramsey any news on making a fix based on info from @tom-- ?

ramsey commented 8 years ago

@Ragazzo, there is nothing to fix in this library. In my last comment, I asked if those experiencing collisions could upgrade to a newer PHP version and see if they continue to see collisions. See above. :-)