Proposal: create a type of generator supporting recursive values

hyperpape commented 5 years ago

Currently, there's no built in support for recursive data types. I'm working on something that would benefit from it. Would such a thing be of interest?

If so, I might take a shot at an implementation. I haven't started thought much about the implementation, but I looked at the implementation in Hypothesis (https://hypothesis.works/articles/recursive-data/, https://github.com/HypothesisWorks/hypothesis/blob/master/hypothesis-python/src/hypothesis/searchstrategy/recursive.py), and I think a similar interface might work.

hyperpape commented 5 years ago

Just following up on this. Does it seem like the sort of thing that would be interesting?

hcoles commented 5 years ago

Can you provide some examples (in Java) of what you have in mind?

hyperpape commented 5 years ago

I'm wanting to generate tree-like data-structures, and thinking there's no easy way to do so (I hope this isn't based on a false assumption). So for example, any of the following structures:

    class Person {
        String name;
        Person child;
    }

    class Node {
        int value;
        Node left;
        Node right;
    }

    class Parent {
        String name;
        List<Parent> children;
    }

The rough idea would be an API something like:

    public static <S, T> Gen<T> recurse(Gen<S> sGen, BiFunction<S, T, T> combine);

    public static Gen<Person> personGen() {
        Gen<String> nameGen = new StringsDSL().basicLatinAlphabet().ofLengthBetween(1, 10);
        return recurse(nameGen, (name, person) -> {
            Person created = new Person();
            created.name = name;
            created.child = person;
            return created;
        });
    }

What this would give you is a tree-like structure, but without having to manually construct the tree from some simpler data, or manually manage how large/small the tree is (though an API should probably give you the option to pass min/max sizes).

The case where a Node can have two children does run into a little difficulty with type inference:

    @FunctionalInterface
    interface TriFunction<R, S, T, U> {
        U apply(R r, S s, T t);
    }

    public static <S, T, U> Gen<U> recurse(Gen<S> sGen, TriFunction<S, T, T, U> func);

    public static Gen<Node> nodeGen() {
        Gen<Integer> intGen = new IntegersDSL().all();
        TriFunction<Integer, Node, Node, Node> f = (i, l, r) -> {
            Node created = new Node();
            created.value = i;
            created.left = l;
            created.right = r;
            return created;
        };
        return recurse(intGen, f);
    }

However, since posting my earlier comments, I've realized that I'm not sure how to handle an object that requires a collection of its own type. What's below typechecks, but doesn't make sense: you really want the implementation of recurse to somehow control the size of the lists returned.

    public static <S, T, U> Gen<U> recurse(Gen<S> sGen, Function<Gen<U>, Gen<T>> transform, BiFunction<S, T, U> func); 

    public static Gen<Parent> parentGen() {
        Gen<String> nameString = new StringsDSL().basicLatinAlphabet().ofLength(5);
        BiFunction<String, List<Parent>, Parent> f = (name, children) -> {
            Parent p = new Parent();
            p.name = name;
            p.children = children;
            return p;
        };
        Function<Gen<Parent>, Gen<List<Parent>> transform = (g) -> new ListsDSL().of(g).ofSize(5);
        return recurse(nameString, transform, f);
    }

jockbert commented 5 years ago

I hope I can shine some light on the subject of recursive data structures.

Usefulness of recursive data structures

To answer the original question - Yes, I think being able to generate recursive data structures can be really useful (or essential), depending on the thing you want to test. Some examples are mathematical expressions, div-elements in HTML, program code AST, arbitrary JSON-data, graphs (recursive trees with some extra edges added to them) and so on.

Size of recursive data structures

Regarding the size of the recursive data structure, how about not specifying the structure size in absolute terms, but just in likely-hood of actually terminating the structure? A key point of Property Based Testing is to let the generated example to surprise you, right :)

Let say you have a type Tree, that can either be a terminating Leaf or a BinaryNode. Further let the BinaryNode contain two sub-trees of type Tree (the recursiveness). If there is equal 50-50 chance of a Tree being either a Leaf or a BinaryNode some generated trees will never terminate, since at least one of the BinaryNode sub-trees is yet another BinaryNode:

(1 + (1 + (1 + (1 + ... ))))

If we instead let there be a slight overweight of e.g. 51% chance of a tree being a Leaf, the Tree will probability-wise eventually terminate.

So, if my math is in order, with the 50% change of each of the sub-trees in a BinaryNode to be yet a BinaryNode, the probability of a BinaryNode to have depth >= 1000 is:

(2 * 0.50) ^ 1000 = 1

With only a 49% chance of a Tree to be a BinaryNode, probability of a BonaryNode of depth >= 1000 is insignificant:

(2 * 0.49) ^ 1000 ~= 1.7*10^-9

Termination problem solved! :-D

So how do you do recursive data structures?

Recursive data structures are already possible to construct with the current versions of QuickTheories, but the QuickTheories API can surely be improved.

The hard part is to be able to reference a generator from itself, without causing a stack overflow. In the following example, some indirection-wizardry is used to break the recursive stack calls:

import java.util.function.BiFunction;
import java.util.function.Supplier;

import org.junit.Ignore;
import org.junit.Test;
import org.quicktheories.WithQuickTheories;
import org.quicktheories.core.Gen;
import org.quicktheories.core.RandomnessSource;

public class RecursiveMathExpressionTest implements WithQuickTheories {

    // --- Classes under test ---

    /** A mathematical expression */
    static interface Expression {
        double calculate();
    }

    static class Litteral implements Expression {
        private long value;

        Litteral(long value) {
            this.value = value;
        }

        @Override
        public double calculate() {
            return value;
        }

        @Override
        public String toString() {
            return "" + value;
        }
    }

    static enum Operator {
        Add("+", (a, b) -> a + b), Sub("-", (a, b) -> a - b), Mult("*", (a, b) -> a * b), Div("/", (a, b) -> a / b);

        public final BiFunction<Double, Double, Double> fn;
        public final String symbol;

        Operator(String symbol, BiFunction<Double, Double, Double> fn) {
            this.symbol = symbol;
            this.fn = fn;
        }
    }

    static class BinaryExpression implements Expression {

        private Operator op;
        private Expression left;
        private Expression right;

        BinaryExpression(Operator op, Expression left, Expression right) {
            this.op = op;
            this.left = left;
            this.right = right;
        }

        @Override
        public double calculate() {
            return op.fn.apply(left.calculate(), right.calculate());
        }

        @Override
        public String toString() {
            return "(" + left + " " + op.symbol + " " + right + ")";
        }
    }

    // --- Generators ---

    public Gen<Expression> litterals() {
        return longs()
                .all()
                .map(Litteral::new);
    }

    public Gen<Expression> operators(Gen<Expression> subExprGen) {
        return arbitrary()
                .enumValues(Operator.class)
                .zip(subExprGen, subExprGen, BinaryExpression::new);
    }

    public Gen<Expression> stackOverflowExpressions() {
        // Will blow the stack when recursively evaluating
        // stackOverflowExpressions().
        return litterals()
                .mix(operators(stackOverflowExpressions()), 49);
    }

    // This class member acts as a place holder for
    // a (soon to be) existing generator.
    // A question: Is it sound practice to reuse the same generator instance in
    // multiple tests, or should a new generator instance be used in each test?
    Gen<Expression> expressionsPlaceHolder = new LazyGen<>(
            () -> expressionsBuilder());

    private Gen<Expression> expressionsBuilder() {
        // No referral to itself. Only delegates to the place holder.
        return litterals()
                .mix(operators(expressionsPlaceHolder), 49);
    }

    /** The class to bake into the QuickTheories API */
    private class LazyGen<T> implements Gen<T> {

        private Supplier<Gen<T>> genereratorSupplier;

        LazyGen(Supplier<Gen<T>> generatorSupplier) {
            this.genereratorSupplier = generatorSupplier;
        }

        @Override
        public T generate(RandomnessSource in) {
            // Get the generator
            Gen<T> generator = genereratorSupplier.get();

            // Re-define the supplier to only return the calculated
            // generator next time called.
            genereratorSupplier = () -> generator;

            // Use the inner generator
            return generator.generate(in);
        }
    }

    // --- The tests ---

    @Test
    /**
     * Will successfully generate recursive values, but for some reason never fail
     * the check. Should fail on e.g. the expression (1 / 0).
     */
    public void allValuesAreFinite_Successfull() {
        qt()
                .forAll(expressionsPlaceHolder)
                .check(expression -> Double.isFinite(expression.calculate()));
    }

    @Test
    @Ignore
    /** Retrieving the generator will cause stack overflow */
    public void allValuesAreFinite_StackOverflow() {
        qt()
                .forAll(stackOverflowExpressions())
                .check(expression -> Double.isFinite(expression.calculate()));
    }
}

jockbert commented 5 years ago

See pull request #65

hcoles commented 5 years ago

@hyperpape @jockbert Thanks for this, sorry I've not been very responsive.

Yes, it would be good to get this into the api. I'm afraid I don't have time to properly look at the PR just now, but from a quick scan the lazy initialization of the supplier is making me slightly nervous. This may (or may not) have implications for the sharing and re-use of generators.

jockbert commented 5 years ago

@hcoles Noted. Would perhaps be good if someone else tried to poke holes in the suggested solution.

@hyperpape You talked about being able to define the size of the generated structure. If it would help the API user, a possible added version can provide an integer (starting at value zero or one) as recursive depth level indicator in the recursive method.

<T> Gen<T> recursive(BiFunction<Gen<T>,Integer,Gen<T>> recursiveGeneratorDefinition)

However, here is an example on how current suggestion can be used for the earlier mentioned Parent example class.

package com.example;

import java.util.Arrays;
import java.util.List;

import org.junit.Test;
import org.quicktheories.WithQuickTheories;
import org.quicktheories.core.Gen;

public class RecursiveParentExample implements WithQuickTheories {

  /** Class under test */
  static class Parent {
    String name;
    List<Parent> children;

    private Parent(List<Parent> children, String name) {
      this.name = name;
      this.children = children;
    }

    @Override
    public String toString() {
      return toStringWithIndent(0);
    }

    /** To string helper method, for nicer print outs */
    private String toStringWithIndent(int level) {
      String result = "";
      while (result.length() < level) {
        result += "-";
      }
      result += " " + name;
      for (Parent child : children) {
        result += "\n" + child.toStringWithIndent(level + 1);
      }
      return result;
    }
  }

  /** Generator of empty lists */
  Gen<List<Parent>> noChildren() {
    return arbitrary().constant(Arrays.<Parent>asList());
  }

  /** Some different names */
  Gen<String> names() {
    return arbitrary().pick("Adam", "Bertha", "Caesar", "Dorothy", "Edgar", "Filippa");
  }

  /** Recursive parent generator. */
  Gen<Parent> parents() {
    return arbitrary().recursive(children -> {

      Gen<List<Parent>> childLists =
          // Average branching factor is 3 for an evenly distributed range [1,5].
          lists().of(children).ofSizeBetween(1, 5)
              // ...therefore we must terminate 68% > 2/3 of the times
              // in order to to have a slight termination overweight.
              // Branch factor 3 multiplied with probability 0.32 equals 0.96 < 1.00
              .mix(noChildren(), 68);

      return childLists.zip(names(), Parent::new);
    });
  }

  @Test
  public void thereAreNoParentsNamedCeasarWithThreeChildren() throws Exception {
    qt()
        .forAll(parents())
        .check(tree -> !hasNameAndChildCount("Caesar", 3, tree));
  }

  boolean hasNameAndChildCount(String name, int childCount, Parent root) {
    boolean isRootAMatch = root.name.equals(name) && root.children.size() == childCount;
    return isRootAMatch || root.children.stream()
        .anyMatch(child -> hasNameAndChildCount(name, childCount, child));
  }
}

hyperpape commented 5 years ago

I'll try to have a look, but it might be a few days.

hyperpape commented 4 years ago

@hcoles I looked at the lazy initialization. If I am reading it correctly, it's not thread-safe (though that's fixable), but otherwise shouldn't have any visible effects on the behavior of the generator.

quicktheories / QuickTheories