oxen-io / libsession-util

Session utilities library
GNU General Public License v3.0
15 stars 15 forks source link

Add property based testing using RapidCheck #13

Open ghost opened 1 year ago

ghost commented 1 year ago

I'm investigating property-based testing for Oxen projects, I think libsession-util is a good start, and session::bt::merge is a good toy study case.

To use property-based testing, the first step is to define data generators, the second step is to define properties.

I implemented a toy data generator for a simplified version of bt_value type, next step is to integrate it into the rest of the code.

Once I have some working prototypes I'll create a draft PR for discussion.

In the meantime comments and feedback are appreciated!

ghost commented 1 year ago
#include <rapidcheck.h>

#include <vector>
#include <algorithm>
#include <variant>

struct bt_value;

using bt_dict = std::map<std::string, bt_value>;
using bt_list = std::list<bt_value>;

using bt_variant = std::variant<
    std::string,
    std::string_view,
    int64_t,
    uint64_t,
    bt_list,
    bt_dict
>;

struct bt_value {
 bt_variant value;
};

bool operator==(const bt_value &lhs, const bt_value &rhs) {
  return std::visit([&](const auto &lhs_val, const auto &rhs_val) -> bool {
    using LHS = std::decay_t<decltype(lhs_val)>;
    using RHS = std::decay_t<decltype(rhs_val)>;
    if constexpr (std::is_same_v<LHS, std::string> &&
                  std::is_same_v<RHS, std::string>) {
      return lhs_val == rhs_val;
    } else if constexpr (std::is_same_v<LHS, std::string_view> &&
                         std::is_same_v<RHS, std::string_view>) {
      return lhs_val == rhs_val;
    } else if constexpr (std::is_same_v<LHS, int64_t> &&
                         std::is_same_v<RHS, int64_t>) {
      return lhs_val == rhs_val;
    } else if constexpr (std::is_same_v<LHS, uint64_t> &&
                         std::is_same_v<RHS, uint64_t>) {
      return lhs_val == rhs_val;
    } else if constexpr (std::is_same_v<LHS, bt_list> &&
                         std::is_same_v<RHS, bt_list>) {
      if (lhs_val.size() != rhs_val.size()) {
        return false;
      }
      auto lhs_iter = lhs_val.begin();
      auto rhs_iter = rhs_val.begin();
      for (; not (lhs_iter == lhs_val.end()); ++lhs_iter, ++rhs_iter) {
        if (not (*lhs_iter == *rhs_iter)) {
          return false;
        }
      }
      return true;
    } else if constexpr (std::is_same_v<LHS, bt_dict> &&
                         std::is_same_v<RHS, bt_dict>) {
      if (lhs_val.size() != rhs_val.size()) {
        return false;
      }
      for (const auto &lhs_elem : lhs_val) {
        auto rhs_iter = rhs_val.find(lhs_elem.first);
        if (rhs_iter == rhs_val.end() || not (lhs_elem.second == rhs_iter->second)) {
          return false;
        }
      }
      return true;
    } else {
      return false;
    }
  }, lhs.value, rhs.value);
}

namespace rc {

template <>
class Arbitrary<std::string_view> {
 public:
  static rc::Gen<std::string_view> arbitrary() {

    // Use the `rc::gen::map` function to map the string generator
    // to a string_view generator
    return rc::gen::map(
      rc::gen::arbitrary<std::string>(),
      [](const std::string &s) {
        return std::string_view(s);
      }
    );
  }
};

Gen<bt_value> genBtValue() {
  return gen::oneOf<bt_value>(
      gen::map(gen::resize(20, gen::arbitrary<std::string>()),
               [](const std::string &s) { return bt_value{s}; }),
      gen::map(gen::resize(20, gen::arbitrary<std::string_view>()),
               [](const std::string_view &s) { return bt_value{s}; }),
      gen::map(gen::arbitrary<int64_t>(),
               [](int64_t i) { return bt_value{i}; }),
      gen::map(gen::arbitrary<uint64_t>(),
               [](uint64_t i) { return bt_value{i}; }),
      gen::map(gen::resize(5, gen::container<bt_list>(gen::lazy(&genBtValue))),
               [](const bt_list &list) { return bt_value{list}; }),
      gen::map(gen::resize(5, gen::container<bt_dict>(
                   gen::arbitrary<std::string>(), gen::lazy(&genBtValue))),
               [](const bt_dict &dict) { return bt_value{dict}; })
    );
}

template <>
struct Arbitrary<bt_value> {
  static Gen<bt_value> arbitrary() {
    return rc::genBtValue();
  }
};

} // namespace rc

void print_bt_value(const bt_value &value) {
  std::visit([](const auto &v) {
    using T = std::decay_t<decltype(v)>;
    if constexpr (std::is_same_v<T, std::string>) {
      std::cout << v;
    } else if constexpr (std::is_same_v<T, std::string_view>) {
      std::cout << v;
    } else if constexpr (std::is_same_v<T, int64_t>) {
      std::cout << v;
    } else if constexpr (std::is_same_v<T, uint64_t>) {
      std::cout << v;
    } else if constexpr (std::is_same_v<T, bt_list>) {
      std::cout << "[";
      bool first = true;
      for (const auto &elem : v) {
        if (!first) {
          std::cout << ", ";
        }
        first = false;
        print_bt_value(elem);
      }
      std::cout << "]";
    }  else if constexpr (std::is_same_v<T, bt_dict>) {
      std::cout << "{";
      bool first = true;
      for (const auto &elem : v) {
        if (!first) {
          std::cout << ", ";
        }
        first = false;
        std::cout << elem.first << ": ";
        print_bt_value(elem.second);
      }
      std::cout << "}";
    }
  }, value.value);
}

int main() {
  rc::check("bt_value test",
            [](const bt_value &mv) {
              auto mv1 = mv;
              print_bt_value(mv);
              RC_ASSERT(mv1 == mv);
            });

  return 0;
}
ghost commented 1 year ago

The above toy example creates a data generator for a simulated (simplified) bt_value type.

After compiling and linking with RapidCheck, it generates 100 random test cases with diverse possibilities of values.

The number of test cases can be customized, the size of lists/dicts can be specified as well. The weight of different branches of bt_variant can be customized as well in theory (however it seems there is a bug or weird behavior in RapidCheck's implementation of weightedOneOf when working together with gen::lazy)

The RapidCheck library is somewhat poorly documented, not actively developing but slowly maintained.

The bitcoin-core project integrated RapidCheck in 2019 but removed it again in 2020, I didn't do research to figure out why.

The current RapidCheck library supports Catch v2 integration by default, Catch v3 is not claimed to be supported but it seems not a serious issue.

I was looking for other property-based testing frameworks but I can't find any good alternative yet. Update: I found google/fuzztest is promising, I found it when trying to answer @majestrate's question below.

bridges the gap between fuzzing and property-based testing

  • a testing framework with a rich API (akin to property-based testing libraries), and
  • a coverage-guided fuzzing engine (akin to AFL or libFuzzer).

https://github.com/google/fuzztest

majestrate commented 1 year ago

maybe i dont understand, but how would this differ from a normal fuzzer based unit test?

ghost commented 1 year ago

Good question, I think fuzzy testing and property-based testing were developed by different people with different approaches and different names but overlapped goals, the kudos and the terminology might matter to the inventors but I wouldn't argue they have big differences nowadays (or in the future).

Fuzzy testing was more for random crashing/security tests originally, it started as pure black box testing with little knowledge of the API behavior, however when testing-coverage-based fuzzer was invented, it became somehow aware of API internals, which technically becomes grey box testing now.

Property-based testing was aimed at functional testing since the day 1 it was invented, usually it has:

Now you mention fuzzer-based unit testing, which is an interesting term that started from fuzzy testing but shift from crashing/security testing to functional testing, so the boundaries between fuzzy and property-based testing are even more blurred now.

One example is google/fuzztest, which claims to be the first tool that

bridges the gap between fuzzing and property-based testing

  • a testing framework with a rich API (akin to property-based testing libraries), and
  • a coverage-guided fuzzing engine (akin to AFL or libFuzzer).

https://github.com/google/fuzztest Update: they claim to be the first, maybe the first in C++, but not the first overall language. See: Coverage Guided, Property Based Testing: https://lemonidas.github.io/pdf/FuzzChick.pdf

IMO the actual naming of the technology is not important, things like google/fuzztest which combines the strength of different tools are worth investigating.