Open DRMacIver opened 2 years ago
there's a lot going on here so I'll just add a few observations and we can worry more about this later since it's not a concern that'll bite us in the real near future, probably
David please read section 5.1 here and see what you think https://users.cs.northwestern.edu/~robby/pubs/papers/scheme2009-kf.pdf
Hmm. Right. The fuel approach is viable here, particularly if we make it part of the generator API. In Hypothesis "fuel" is basically the length of the choice sequence, and we make the design decision that generators always fail when they run out of fuel rather than trying to terminate safely, but it's reasonable to try to finish up too.
One thing to consider if we do build fuel into the generator is that we can treat running out of fuel like a call to reject
(#2) - let generation continue in this fallback mode and then try to avoid that in future. This avoids the skewed leaf distribution problem that they talk about in section 5.1 there (though think this is a depth bounding approach rather than a fuel approach, right? i.e. if I generate (a(), b())
in the redex approach, the size of a
has no bearing on the size of b
?)
In retrospect, part of where I'm coming here in Hypothesis, where generators implementing fuel approaches themselves is The Worst, because it completely breaks internal test-case reduction, but that's not so much something we have to worry about here, so we've got more flexibility.
ok so earlier I was responding to the "unbounded depth" aspect of your issue here, but there's another thing that I think you were getting at which is that we might end up with a nasty positive feedback loop where we deep-dive into some subtree, causing it to get a massive estimated cardinality, and then we never explore any other subtree again after that. this problem doesn't require unbounded depth, I suspect it could happen even at very shallow decision depths.
I think we'll end up needing a principled way to avoid this and I don't yet know what that might look like. an unprincipled thing to do would be to put some of the search budget into just round-robining around all unexplored decision edges. or something, I don't know...
ok here's a thing that might help: if our estimate of the cardinality hiding behind an unexplored edge decreases as we go deeper into the tree this might discourage pathological DFS-like behaviors. I think it's not hard to make a principled argument that that's what we should do, but I suspect that picking broadly good decay function will be hard.
Oh I wasn't actually thinking about the distribution skew when I mentioned this, just the fact that a given generation can go wrong, but you're right that it's a big problem. It's a particularly big problem because it basically means that the desired goal (uniform generation) is "wrong" in some sense.
Here's a principled approach to the problem that might help (I think I mentioned the core idea before in passing): Rather than aiming to approximate a uniform sample, we try to approximate a Boltzmann sampler of parameter x
for some value in [0.5, 1]
. x = 0.5
is the standard naive coin flip model, x = 1
is the uniform model, and as x
increases the average size of the generator goes up.
This is pretty easy to do, because it's just the same thing as the existing idea, except rather than having the weight of a node be the sum of the weight of its children (handling unvisited nodes however we like), it's that times x
.
We can then gradually "warm" the generation - start x
at or near 0.5
, and increase it gradually, stopping when we see a sudden size blow up (details TBD).
I've been working on similar problems, and this issue is something I am currently working on.
One context is test unit generation (EvoSuite), where we want to generate diverse inputs that maximize test coverage, but minimize test suite size (or execution time, if you want). We use evolutionary algorithms to evolve populations of tests/test suites.
Another context is generation of random programs in (Grammar-Guided) Genetic Programming, that can be applied to any problem. Test-generation is one example. Here, we support dependent/refined types (in Python) to describe the grammar, but you have explicit grammars in PonyGE2.
Because our grammars can be much more complex than S -> S + S | 1, we do some pre-processing to detect the distance to a terminal node. Then, we expand root nodes in different ways:
We also have an upcoming chapter on GPTP on the issue of rejection in the tree generation. We find that rejection sampling gets stuck a lot in generating a population of X diverse individuals. Through-out this generation, we keep weights per production (and per production, per level, in another approach) to reduce generation time, and increase diversity.
We're planning on open-sourcing the Python framework we're using to prototype this. The API is a bit different than Hypothesis, but aims to be more implicit (using types) but allow the same level of customization.
@DRMacIver I agree we should try that
but also hear out a different idea for defeating DFS-like behaviors (that I think it is somewhat similar to your initial pseudocode), which is to keep track of every unexplored branch that we've seen so far in the search tree and visit them all, starting with the shallowest. so we end up with a priority queue of unvisited paths through the tree. this is a BFS of the top part of the tree, but then as soon as we're off the beaten path we do regular random generation. what I like about this is it's free of parameters to mess with and should be free of several of the pathologies we've talked about here. also, I've had good luck with BFS-like approaches in the past. seems simple enough to be worth trying out.
@alcides thanks!
hear out a different idea for defeating DFS-like behaviors (that I think it is somewhat similar to your initial pseudocode), which is to keep track of every unexplored branch that we've seen so far in the search tree and visit them all, starting with the shallowest. so we end up with a priority queue of unvisited paths through the tree. this is a BFS of the top part of the tree, but then as soon as we're off the beaten path we do regular random generation.
Yeah I think something like this is a very good idea to try. We probably want to switch strategies once a level gets too large and start sampling from that level rather than doing DFS? But let's try the simple thing and see how well it works.
Here's an example of a generator I had in mind as a good test case that I think shows limitations of the DFS approach, although it also doesn't do great with the Boltzmann sampler approach:
for i in 0...50 {
if flip() {
// details not important here, point is just we generate some stuff.
for i in 0...8 {
flip();
}
return i;
}
return 50;
}
Ideally what we'd like to see is a uniform spread of values returned from this. The DFS approach will get entirely bogged down by the inner generation, and the Boltzmann sampler approach will skew pretty heavily towards the earlier values but I think make slightly more progress (it will be uniform once the parameter reaches 1 though).
Yeah I think something like this is a very good idea to try. We probably want to switch strategies once a level gets too large and start sampling from that level rather than doing DFS? But let's try the simple thing and see how well it works.
so perhaps there's some sort of schedule where 50% of the time we pick an untried path from the highest available level, 25% of the time from the next-highest level, etc. I mean that's probably a bad specific decay function but we can play with it.
ok I'm a bit short on free time but I will work on these two ideas, they both seem cool and promising, then we'll do some evaluation and see what we have. we'll write a library of pathological little generators like yours here and see if what comes out seems plausible, and we can also connect this library to some real generators.
I have a slightly depressing idea for hooking this library up to a generator such as Csmith, which is not going to want to generate multiple programs in a single run, it just doesn't support that. the idea is to put this library into a separate little server process and then Csmith or whatever talks to it using a pipe or socket. ideally, this lets lots of generators run in parallel too.
so perhaps there's some sort of schedule where 50% of the time we pick an untried path from the highest available level, 25% of the time from the next-highest level, etc. I mean that's probably a bad specific decay function but we can play with it.
Yeah, that sounds reasonable.
ok I'm a bit short on free time but I will work on these two ideas, they both seem cool and promising, then we'll do some evaluation and see what we have. we'll write a library of pathological little generators like yours here and see if what comes out seems plausible, and we can also connect this library to some real generators.
Cool. I'm very happy to start contributing code and such when there's a bit more structure in place (e.g. very happy to write pathological generators and implement some alternate algorithms for evaluation), but I'll leave the initial groundwork to you.
I have a slightly depressing idea for hooking this library up to a generator such as Csmith, which is not going to want to generate multiple programs in a single run, it just doesn't support that. the idea is to put this library into a separate little server process and then Csmith or whatever talks to it using a pipe or socket. ideally, this lets lots of generators run in parallel too.
This is more or less what we did to hook Csmith up to Hypothesis - we ran it as a controlling process with Csmith launched as a subprocess, but same basic idea..
Recursive generators often have to be finely tuned to stay finite. e.g. consider something like the following:
This is carefully tuned so that it never produces unboundedly large tree. In contrast the following has infinite expected size:
The problem is that if we are successful in our goals of uniform sampling, we turn the former into something closer to the latter, and the generator blows up.
In this case we can solve this by always reverting to uniform generation past some certain prefix, but it gets complicated with something like the following:
This has two problems:
In Hypothesis we solve this problem badly by just bounding the maximum size of choice sequence we're allowed to use. This requires an answer to #1 (we bound it by terminating generation if it gets too large), and also our default choice of size is much too low for a lot of real generators (we mostly get away with this because of the general pattern of generators used with Hypothesis).