google / souper

A superoptimizer for LLVM IR
Apache License 2.0
2.17k stars 170 forks source link

Possible bugs when generating inputs for dataflow pruning #870

Open liamsemeria opened 1 year ago

liamsemeria commented 1 year ago

Currently the generated inputs for dataflow pruning looks something like this:

        Var 0 : -1
        Var 0 : 1
        Var 0 : 0
        Var 0 : 2147483647
        Var 0 : -2147483648
        Var 0 : 0
        Var 0 : 1
        Var 0 : -1
        Var 0 : 2147483647
        Var 0 : -2147483648
        Var 0 : 520932930
        Var 0 : 28925691
        Var 0 : 822784415
        Var 0 : 890459872
        Var 0 : 145532761
        Var 0 : 1
        Var 0 : 26
        Var 0 : 1
        Var 0 : 6
        Var 0 : 1

Inputs 0,1,-1, 2147483647, -2147483647 are "special inputs" and the remaining values are supposed to be random. Since by default only 10 inputs are tested for each candidate, random inputs are never used. The special inputs are generated twice because there are currently two implementations.

In Pruning.cpp line 761:

  constexpr unsigned PermutedLimit = 15;
  std::string specialInputs = "abcde";
  std::unordered_set<std::string> Visited;
  do {
    int i = 0;
    std::string usedInput;
    for (auto &&I : Inputs) {
      if (I->K == souper::Inst::Var) {
        usedInput.append(1, specialInputs[i]);
        Cache[I] = getSpecialAPInt(specialInputs[i++], I->Width);
      }
    }

    if (!Visited.count(usedInput) && isInputValid(Cache)) {
      if (InputSets.size() >= PermutedLimit) break;
      InputSets.push_back(Cache);
      Visited.insert(usedInput);
    }
  } while (std::next_permutation(specialInputs.begin(), specialInputs.end()));

And right below it:

  for (auto &&I : Inputs) {
    if (I->K == souper::Inst::Var)
      Cache[I] = {llvm::APInt(I->Width, 0)};
  }
  if (isInputValid(Cache))
    InputSets.push_back(Cache);

  for (auto &&I : Inputs) {
    if (I->K == souper::Inst::Var)
      Cache[I] = {llvm::APInt(I->Width, 1)};
  }
  if (isInputValid(Cache))
    InputSets.push_back(Cache);

  for (auto &&I : Inputs) {
    if (I->K == souper::Inst::Var)
      Cache[I] = {llvm::APInt(I->Width, -1)};
  }
  if (isInputValid(Cache))
    InputSets.push_back(Cache);

  for (auto &&I : Inputs) {
    if (I->K == souper::Inst::Var)
      Cache[I] = {llvm::APInt::getSignedMaxValue(I->Width)};
  }
  if (isInputValid(Cache))
    InputSets.push_back(Cache);

  for (auto &&I : Inputs) {
    if (I->K == souper::Inst::Var)
      Cache[I] = {llvm::APInt::getSignedMinValue(I->Width)};
  }
  if (isInputValid(Cache))
    InputSets.push_back(Cache);

Commenting out either of these only adds the special inputs once which avoids testing the same input multiple times and makes room for random inputs. Random Inputs are generated In Pruning.cpp line 817:

  constexpr int MaxTries = 100;
  constexpr int NumLargeInputs = 5;
  std::srand(0);
  int i, m;
  for (i = 0, m = 0; i < NumLargeInputs && m < MaxTries; ++m ) {
    for (auto &&I : Inputs) {
      if (I->K == souper::Inst::Var)
        Cache[I] = {llvm::APInt(I->Width, std::rand() % llvm::APInt(I->Width, -1).getLimitedValue())};
    }
    if (isInputValid(Cache)) {
      i++;
      InputSets.push_back(Cache);
    }
  }

  if (StatsLevel > 2 && i < NumLargeInputs) {
    llvm::errs() << "MaxTries (100) exhausted searching for large inputs.\n";
  }

  constexpr int NumSmallInputs = 5;
  for (i = 0, m = 0; i < NumSmallInputs && m < MaxTries; ++m ) {
    for (auto &&I : Inputs) {
      if (I->K == souper::Inst::Var)
        Cache[I] = {llvm::APInt(I->Width, std::rand() % I->Width)};
    }
    if (isInputValid(Cache)) {
      i++;
      InputSets.push_back(Cache);
    }
  }

  if (StatsLevel > 2 && i < NumSmallInputs) {
    llvm::errs() << "MaxTries (100) exhausted searching for small inputs.\n";
  }

The random inputs are never reseeded, but changing std::srand(0); to std::srand(time(0)); makes random numbers work as intended. Also, when generating small inputs they have a range of 0 to I->Width. This leads to a lot of repeat values since MaxTries is 100 and you could only be generating values from 0 to 8 for example. Maybe replacing Cache[I] = {llvm::APInt(I->Width, std::rand() % I->Width)}; with something like Cache[I] = {llvm::APInt(I->Width, std::rand() % (some value depending on width))}; would lead to less repeat values?

I have fixes for these issues and can make a pull request. I just wanted to double check first since I didn't know which implementation of adding special inputs to remove since they both work. I also made the max amount of inputs to try user controllable which is part of a TODO on Pruning.cpp line 246. No changes were made to the range of random small inputs but I fixed the srand.

regehr commented 1 year ago

thanks!

this code is by @manasij7479 -- Manasij please look into it

manasij7479 commented 1 year ago

Hi @liamsemeria, std::srand(0) is intentional.
We want a diverse set of deterministic inputs, and not random ones.
Random ones would make it difficult to test pruning success and failures.

Please send a fix for the rest of your points!

liamsemeria commented 1 year ago

Ok cool PR is up for the other fixes I mentioned!