imneme / pcg-cpp

PCG — C++ Implementation
Apache License 2.0
745 stars 99 forks source link

cross-platform failure to reproduce sequence #50

Closed psteinb closed 5 years ago

psteinb commented 5 years ago

I must apologize if this issue reveals a misunderstanding on my side. Nevertheless, I've been banging my head against this for too long, so I'd like to go out public and ask for help.

I am working on a simulation code that is stochastic in nature, so requires drawing from PRNGs repeatedly. In an attempt to move forward in a tested and reproducible way, I discovered a problem with xcode recently, that I have boiled down to some simple example code:

//compile with: c++ -std=c++11 -o stdlib_20 stdlib_20.cpp

#include <iostream>
#include <iomanip>
#include <random>

int main(int argc, char *argv[])
{
    std::mt19937 rng1(121);
    std::mt19937 rng2(100);
    std::normal_distribution<double> norm(5,2);
    std::uniform_int_distribution<> uni(0,50);
    std::cout.precision(8);
    for( int i = 0;i<20;++i){
        std::cout << std::setw(10) << norm(rng1) << " "
                  << std::setw(4) << uni(rng2)
                  << "\n";
    }
    return 0;
}

or using pcg of commit b656278

//compile with: c++ -std=c++11 -o pcg_20 pcg_20.cpp

#include <iostream>
#include <iomanip>
#include <random>
#include "pcg_random.hpp"

int main(int argc, char *argv[])
{
    pcg32 rng1(121);
    pcg32 rng2(100);

    std::normal_distribution<double> norm(5,2);
    std::uniform_int_distribution<> uni(0,50);
    std::cout.precision(8);
    for( int i = 0;i<20;++i){
        std::cout << std::setw(10) << norm(rng1) << " "
                  << std::setw(4) << uni(rng2)
                  << "\n";
    }
    return 0;
}

The interesting observation is, that the above code gives different results with gcc 8.3.1 on fedora 29 and on osx with xcode 9.0.0, Target: x86_64-apple-darwin17.7.0. Here is a diff of the top 10 numbers using pcg:

$ diff -y pcg.txt pcg_osx.txt|head -10
  4.689577   32                           |  7.1985629   49
 7.1985629   48                           |   4.689577   30
 4.8603939   35                           |  5.1719846    6
 5.1719846   46                           |  4.8603939   14
 1.9203969   12                           |  6.7887334   39
 6.7887334   10                           |  1.9203969   10
 4.9383387   10                           |  7.2118544   14
 7.2118544   32                           |  4.9383387   45
   4.05167   33                           |  2.6778881   35
 2.6778881   21                           |    4.05167   26

And using the stdlib:

$ diff -y stdlib.txt stdlib_osx.txt|head -10
 3.7834751   27                           |  6.5112185    8
 6.5112185   34                           |  3.7834751   24
 3.4408029   14                           |  5.2115202    3
 5.2115202   21                           |  3.4408029   39
 8.6759752   21                           |  6.0075102   23
 6.0075102   26                           |  8.6759752   15
 7.7195293   43                           |  6.0009774   48
 6.0009774    7                           |  7.7195293   10
 8.1130049    0                           |  6.1374016   30
 6.1374016    7                           |  8.1130049   34

The normal distribution (left column) appears to show consistent numbers but in an arbitrary ordering when comparing linux and osx. About the integer numbers I am quite uncertain what is going on. The question for me now is, is this an expected behavior/feature of PRNGs or is this a bug in xcode/gcc ?

rstub commented 5 years ago

The PRNGs (MT and PCG) should produce the same numbers on any platform. However, the algorithms for producing the standard random number distributions in C++ are implementation-defined. You could use (a fixed version of) boost.random to achieve cross-platform reporducable random draws.

The odd pattern for the normal draws comes from the used algorithm. Both platforms use the (rather slow) Box-Muller method. This transforms two random numbers from [0,1) to two N(0,1) numbers. Typically one will cache one result and return the other.

BTW, there is no need to have two instances of the RNG here.

psteinb commented 5 years ago

@rstub Interesting & thanks for the explanation. I didn't dig the c++ standard, but this is somewhat discouraging (at least for my use case). At this point, adding boost.random as a dependency appears too heavy for the project at hand. I'll have to find an alternative.

I checked with python and this gives the same sequence on both platforms in question. As a whole, that is quite discouraging with respect to c++. Not sure about other languages.

rstub commented 5 years ago

Not fixing the distributions is indeed a major problem with random. However, much of the complexity in both random and boost.random stems from them being compatible with virtually any PRNG. It shouldn’t be to difficult to roll your own If you restrict yourself to sensible ones (i.e. returning 32bit or 64Bit ints) and a small set of distributions. See http://www.pcg-random.org/posts/bounded-rands.html for uniform_int examples.

lemire commented 5 years ago

The uniform distribution call to (0,50) is unnecessary. The pcg library already supports the generation of integers over an interval. The reference offered by @rstub is excellent, and note that the code is super simple.