pytorch / glow

Compiler for Neural Network hardware accelerators
Apache License 2.0
3.23k stars 691 forks source link

Glow should provide its own implementation of a random number generator to ensure deterministic testing and avoid platform-dependent differences #1556

Open opti-mix opened 6 years ago

opti-mix commented 6 years ago

Glow currently uses the Mersenne Twister engine, which is std::mt19937.

It turns out that this random number generator seeded with the same values generates different random numbers depending on the OS, e.g. it produces different results on MacOS and Linux.

Most likely, the implementation of std::mt19937 is somehow platform-dependent.

To avoid such problems in the future, Glow could use its own in-tree implementation of a random-number generation, which has no target dependencies. It can even be an implementation of mt19937.

nadavrot commented 6 years ago

@opti-mix A long time ago Glow had a target independent stable doom1-style random number generator:

https://github.com/pytorch/glow/blob/f6b610cc092c08ced8fa6c5ab34cb1ac113590c2/src/glow/Support/Random.cpp

stoklund commented 6 years ago

The function of std::mt19937 is completely specified by the C++ standard, so it should be portable. On the other hand, the operation of uniform_int_distribution is apparently not fully specified, so it is possible that different implementations can return different numbers.

The simplest fix is probably to avoid the *_distribution templates and use the MT output directly.

bertmaher commented 6 years ago

https://xkcd.com/221/

weimin commented 3 years ago

Hi, I am new to this repo, I found on reddit a similar issue https://www.reddit.com/r/cpp/comments/7i21sn/til_uniform_int_distribution_is_not_portable/ and that the boost library provides a portable implementation and further on stackoverflow there is an example implementation https://stackoverflow.com/questions/2254909/boost-random-number-generator, is it alright if I followed that implementation for this issue and make a PR. The other option is to write our own function like here https://stackoverflow.com/questions/34903356/c11-random-number-distributions-are-not-consistent-across-platforms-what-al Thanks!

jfix71 commented 3 years ago

Hi @weimin -- I believe we do already take Boost as a dependency in order to support folly. So that is a potential option. However, another option would be to follow @stoklund's suggestion, and IMO that would be preferable, since I believe should be pretty simple and limit our dependence which I personally would prefer.

The simplest fix is probably to avoid the *_distribution templates and use the MT output directly.

weimin commented 3 years ago

Hi, Thanks for your reply, I tested the MT output:

int main(){
  typedef std::chrono::high_resolution_clock myclock;
  myclock::time_point beginning = myclock::now();
  myclock::duration d = myclock::now() - beginning;
  unsigned seed = d.count();
  std::mt19937 generator(seed);
  std::map<int, int> hist;
  int a = 1;
  int b = 6;
  double dScale = ((double)b+1 - (double)a) / ((double)(generator.max() - generator.min())+(double)1);
  std::cout << "scale: " << dScale << "\n";
  for(int n=0;n<10000;++n){
    int output = (generator() - generator.min()) * dScale  + a;
    ++hist[output];
  }
  for (auto p : hist) {
        std::cout << std::fixed << std::setprecision(1) << std::setw(2)
                  << p.first << ' ' << std::string(p.second/200, '*') << '\n';
  }
  return 0;
}
(base) Weis-MBP:prng weimin$ ./test
scale: 1.39698e-09
 1 ********
 2 ********
 3 ********
 4 ********
 5 ********
 6 ********

However I am having some problems building the code to do more tests, I am using os X Catalina (10.15.6) and I followed the instructions to build it using homebrew and I also updated XCode but its giving me the following errors: CMakeError.log CMakeOutput.log

I also tried to build on aws (Ubuntu 18.04) and got much further but it still didn't complete at ninja all: [428/674] Linking CXX executable bin/model-profiler FAILED: bin/model-profiler CMakeError.log CMakeOutput.log output.txt

Do you have any suggestions on how to build the code, in the error log it looks like the clang: error: unknown argument: '-wd654' clang: error: unknown argument: '-wd654' ninja: build stopped: subcommand failed. issue from a previous post.

Thanks!