How do we handle unbounded generator depth?

DRMacIver commented 2 years ago

Recursive generators often have to be finely tuned to stay finite. e.g. consider something like the following:

Tree tree() {
    if (with_probability(1/3)) {
        return new Branch(tree(), tree());
    } else {
        return new Leaf();
    }
}

This is carefully tuned so that it never produces unboundedly large tree. In contrast the following has infinite expected size:

Tree tree() {
    if (with_probability(1/2)) {
        return new Branch(tree(), tree());
    } else {
        return new Leaf();
    }
}

The problem is that if we are successful in our goals of uniform sampling, we turn the former into something closer to the latter, and the generator blows up.

In this case we can solve this by always reverting to uniform generation past some certain prefix, but it gets complicated with something like the following:

let trees = new List<Tree>();
while flip() {
      trees.push(tree())
}

This has two problems:

Really weird skew where only the early trees get our rebalancing.
Greatly increases the size of the list in probably pathological ways.

In Hypothesis we solve this problem badly by just bounding the maximum size of choice sequence we're allowed to use. This requires an answer to #1 (we bound it by terminating generation if it gets too large), and also our default choice of size is much too low for a lot of real generators (we mostly get away with this because of the general pattern of generators used with Hypothesis).

regehr commented 2 years ago

there's a lot going on here so I'll just add a few observations and we can worry more about this later since it's not a concern that'll bite us in the real near future, probably

generators often have a "budget" or "fuel" parameter that they take care to decrement when certain things get generated, and also of course devs need to ensure that the generator does not contain any fuel-conserving cycles. this sort of explicit depth limiting will not fail when we start playing with probabilities. so one way to address this issue is "don't use probabilities to terminate the generator"
when a generator runs out of fuel, it needs to either fail or have a way to finish up the current test case without consuming any more fuel. in typical generator implementations, this basically means that every random choice that wants to use fuel has to have a safe option that leads rapidly to finishing up the current thing being generated. what if we made this safe, fuel-conserving option explicit in the API? this way the balancer could take care of fuel internally, and the generator proper would not need to be concerned with it.

regehr commented 2 years ago

David please read section 5.1 here and see what you think https://users.cs.northwestern.edu/~robby/pubs/papers/scheme2009-kf.pdf

DRMacIver commented 2 years ago

Hmm. Right. The fuel approach is viable here, particularly if we make it part of the generator API. In Hypothesis "fuel" is basically the length of the choice sequence, and we make the design decision that generators always fail when they run out of fuel rather than trying to terminate safely, but it's reasonable to try to finish up too.

One thing to consider if we do build fuel into the generator is that we can treat running out of fuel like a call to reject (#2) - let generation continue in this fallback mode and then try to avoid that in future. This avoids the skewed leaf distribution problem that they talk about in section 5.1 there (though think this is a depth bounding approach rather than a fuel approach, right? i.e. if I generate (a(), b()) in the redex approach, the size of a has no bearing on the size of b?)

In retrospect, part of where I'm coming here in Hypothesis, where generators implementing fuel approaches themselves is The Worst, because it completely breaks internal test-case reduction, but that's not so much something we have to worry about here, so we've got more flexibility.

regehr commented 2 years ago

ok so earlier I was responding to the "unbounded depth" aspect of your issue here, but there's another thing that I think you were getting at which is that we might end up with a nasty positive feedback loop where we deep-dive into some subtree, causing it to get a massive estimated cardinality, and then we never explore any other subtree again after that. this problem doesn't require unbounded depth, I suspect it could happen even at very shallow decision depths.

I think we'll end up needing a principled way to avoid this and I don't yet know what that might look like. an unprincipled thing to do would be to put some of the search budget into just round-robining around all unexplored decision edges. or something, I don't know...

regehr commented 2 years ago

ok here's a thing that might help: if our estimate of the cardinality hiding behind an unexplored edge decreases as we go deeper into the tree this might discourage pathological DFS-like behaviors. I think it's not hard to make a principled argument that that's what we should do, but I suspect that picking broadly good decay function will be hard.

DRMacIver commented 2 years ago

Oh I wasn't actually thinking about the distribution skew when I mentioned this, just the fact that a given generation can go wrong, but you're right that it's a big problem. It's a particularly big problem because it basically means that the desired goal (uniform generation) is "wrong" in some sense.

Here's a principled approach to the problem that might help (I think I mentioned the core idea before in passing): Rather than aiming to approximate a uniform sample, we try to approximate a Boltzmann sampler of parameter x for some value in [0.5, 1]. x = 0.5 is the standard naive coin flip model, x = 1 is the uniform model, and as x increases the average size of the generator goes up.

This is pretty easy to do, because it's just the same thing as the existing idea, except rather than having the weight of a node be the sum of the weight of its children (handling unvisited nodes however we like), it's that times x.

We can then gradually "warm" the generation - start x at or near 0.5, and increase it gradually, stopping when we see a sudden size blow up (details TBD).

alcides commented 2 years ago

I've been working on similar problems, and this issue is something I am currently working on.

One context is test unit generation (EvoSuite), where we want to generate diverse inputs that maximize test coverage, but minimize test suite size (or execution time, if you want). We use evolutionary algorithms to evolve populations of tests/test suites.

Another context is generation of random programs in (Grammar-Guided) Genetic Programming, that can be applied to any problem. Test-generation is one example. Here, we support dependent/refined types (in Python) to describe the grammar, but you have explicit grammars in PonyGE2.

Because our grammars can be much more complex than S -> S + S | 1, we do some pre-processing to detect the distance to a terminal node. Then, we expand root nodes in different ways:

Left-to-Right DFS. This approach introduces bias towards higher depths on the left of the tree.
Position Independent Grow (Fig.1) This approach expands each symbol at once. It keeps a queue of current non-terminal symbols, and expands them in a random order. This approach solves the left-bias of the previous approach. There is another motivation here: you want trees of a maximum depth of d, but not all branches need to have that depth. So nodes are expanded in a random order using only recursive productions. When you get to the maximum desired depth, you only choose from terminal productions. Afterwards, you can select any (recursive and terminal) production at any level, and only terminals at the maximum desired depth. We combine this with probabilistic grammars, where productions can have weights.
Dependent Position Independent Grow. We extend PI-Grow to support dependencies between symbols on the same production. Assume S -> A B C, but C depends on both A and B. Then they can be expanded in any order that complies with the dependencies. So A or B can be expanded first, then the other and C finally.

We also have an upcoming chapter on GPTP on the issue of rejection in the tree generation. We find that rejection sampling gets stuck a lot in generating a population of X diverse individuals. Through-out this generation, we keep weights per production (and per production, per level, in another approach) to reduce generation time, and increase diversity.

We're planning on open-sourcing the Python framework we're using to prototype this. The API is a bit different than Hypothesis, but aims to be more implicit (using types) but allow the same level of customization.

regehr commented 2 years ago

@DRMacIver I agree we should try that

but also hear out a different idea for defeating DFS-like behaviors (that I think it is somewhat similar to your initial pseudocode), which is to keep track of every unexplored branch that we've seen so far in the search tree and visit them all, starting with the shallowest. so we end up with a priority queue of unvisited paths through the tree. this is a BFS of the top part of the tree, but then as soon as we're off the beaten path we do regular random generation. what I like about this is it's free of parameters to mess with and should be free of several of the pathologies we've talked about here. also, I've had good luck with BFS-like approaches in the past. seems simple enough to be worth trying out.

regehr commented 2 years ago

@alcides thanks!

DRMacIver commented 2 years ago

hear out a different idea for defeating DFS-like behaviors (that I think it is somewhat similar to your initial pseudocode), which is to keep track of every unexplored branch that we've seen so far in the search tree and visit them all, starting with the shallowest. so we end up with a priority queue of unvisited paths through the tree. this is a BFS of the top part of the tree, but then as soon as we're off the beaten path we do regular random generation.

Yeah I think something like this is a very good idea to try. We probably want to switch strategies once a level gets too large and start sampling from that level rather than doing DFS? But let's try the simple thing and see how well it works.

DRMacIver commented 2 years ago

Here's an example of a generator I had in mind as a good test case that I think shows limitations of the DFS approach, although it also doesn't do great with the Boltzmann sampler approach:

for i in 0...50 {
    if flip() {
         // details not important here, point is just we generate some stuff.
         for i in 0...8 {
             flip();
         }
        return i;
    }
    return 50;
}

Ideally what we'd like to see is a uniform spread of values returned from this. The DFS approach will get entirely bogged down by the inner generation, and the Boltzmann sampler approach will skew pretty heavily towards the earlier values but I think make slightly more progress (it will be uniform once the parameter reaches 1 though).

regehr commented 2 years ago

Yeah I think something like this is a very good idea to try. We probably want to switch strategies once a level gets too large and start sampling from that level rather than doing DFS? But let's try the simple thing and see how well it works.

so perhaps there's some sort of schedule where 50% of the time we pick an untried path from the highest available level, 25% of the time from the next-highest level, etc. I mean that's probably a bad specific decay function but we can play with it.

ok I'm a bit short on free time but I will work on these two ideas, they both seem cool and promising, then we'll do some evaluation and see what we have. we'll write a library of pathological little generators like yours here and see if what comes out seems plausible, and we can also connect this library to some real generators.

I have a slightly depressing idea for hooking this library up to a generator such as Csmith, which is not going to want to generate multiple programs in a single run, it just doesn't support that. the idea is to put this library into a separate little server process and then Csmith or whatever talks to it using a pipe or socket. ideally, this lets lots of generators run in parallel too.

DRMacIver commented 2 years ago

so perhaps there's some sort of schedule where 50% of the time we pick an untried path from the highest available level, 25% of the time from the next-highest level, etc. I mean that's probably a bad specific decay function but we can play with it.

Yeah, that sounds reasonable.

ok I'm a bit short on free time but I will work on these two ideas, they both seem cool and promising, then we'll do some evaluation and see what we have. we'll write a library of pathological little generators like yours here and see if what comes out seems plausible, and we can also connect this library to some real generators.

Cool. I'm very happy to start contributing code and such when there's a bit more structure in place (e.g. very happy to write pathological generators and implement some alternate algorithms for evaluation), but I'll leave the initial groundwork to you.

I have a slightly depressing idea for hooking this library up to a generator such as Csmith, which is not going to want to generate multiple programs in a single run, it just doesn't support that. the idea is to put this library into a separate little server process and then Csmith or whatever talks to it using a pipe or socket. ideally, this lets lots of generators run in parallel too.

This is more or less what we did to hook Csmith up to Hypothesis - we ran it as a controlling process with Csmith launched as a subprocess, but same basic idea..

regehr / guided-tree-search

How do we handle unbounded generator depth? #3