pschanely / CrossHair

An analysis tool for Python that blurs the line between testing and type systems.
Other
1.03k stars 49 forks source link

Discover more branches by inspecting the AST #40

Closed Zac-HD closed 3 years ago

Zac-HD commented 3 years ago

What's the idea?

Let's consider the simple function f, and imagine executing it with a symbolic value x=0:

def f(x: int) -> Optional[bool]:
     if x > 0:
        if x % 2:
           return True
       return False
    return None

Because we execute if x > 0:, we discover both the False (taken) branch and it's negation but don't discover the un-executed inner branch if x % 2:. Execution isn't the only way to discover what code does though - we could also read it, or write a program to analyse the abstract syntax tree of f:

body=[
  If(
    test=Compare(left=Name(id='x', ctx=Load()), ops=[Gt()], comparators=[Num(n=0)]),  # We already found this one
    body=[
      If(
        test=BinOp(left=Name(id='x', ctx=Load()), op=Mod(), right=Num(n=2)),  # but *didn't* see this branch
        body=[Return(value=NameConstant(value=True))],
        orelse=[]),
      Return(value=NameConstant(value=False))
    ],
    orelse=[]),
  Return(value=NameConstant(value=None))
]

The Fuzzing Book has a chapter on symbolic fuzzing for Python, which is both useful background reading and a good source of starter code!

Why bother?

A trivial answer is that finding more branches per executed example is faster, and who doesn't like improved performance? More seriously, I think this would allow CrossHair to support more programs as well as improving how it tests those which are already supported.

The current "concolic" analysis is fantastic, and handles cases such as dict-based control flow which cannot be inferred from a simple syntactic analysis (but doesn't see inner branches). On the other hand, this AST-based approach can handle cases where it's very difficult to patch in symbolic objects at runtime (but doesn't handle complicated scoping or control-flow issues).

I think it's possible, and feasible, to combine these approaches at runtime using e.g. ast.parse(inspect.getsource(f)) and therefore to keep the strengths and work around the weaknesses of each 🙂

pschanely commented 3 years ago

Thanks for writing this up, Zac! I like this idea a lot. It will be a while before I would be able to personally prioritize an effort like this, but for all you internet folks, I am very willing to consult with anyone who wants to give it a try!

Zac-HD commented 3 years ago

I think this is the approach that https://github.com/life4/deal/pull/72 is taking - maybe a chance to share some code?

pschanely commented 3 years ago

I'm hoping so; I've started a conversation with @orsinium :)

The general case is so challenging: operator overloads, the descriptor protocol, variable scoping rules, aliasing in the heap, evaluation order issues. By leaning on the real interpreter, CrossHair gets a lot of stuff for free.

Of course, our solution is: don't handle the general case! What remains to be understood, in my mind, is how far we can get.

Notably, I chatted quickly with @jpolitz after @jeanqasaur's PLTalk recently who worked on this formalization of Python semantics. Joe described it as "how much of Python can we implement with other parts of Python," which feels like it might be helpful in an endeavor like this.

orsinium commented 3 years ago

Last weekend, I bought a binary wristwatch. And while doing a conversion from 2-base to 10-base I noticed that sum of 2^n + 2^(n+2) ends on 0 in 10-base for all integer n > 0. In other words:

(2^n + 2^(n+2)) % 10 == 0

It's easy to prove:

(2^n + 2^(n+2)) % 10 == 0
(2^n * 1 + 2^n * 2^2) % 10 == 0
(2^n * 5) % 10 == 0
2^n % 2 == 0  # <- true for any n > 0, let's just stop here

Can we prove it using z3? Let's try:

import z3
s = z3.Solver()
n = z3.Int('n')
s.add(n > 0)
s.add(z3.Not((2**n + 2**(n+2)) % 10 == 0))
s.check()
# unknown

The result is unknown and s.model() raises "model is not available". Why so? IDK. Something is up with powers, I suppose:

z3.solve(2 ** n == 2)
# failed to solve

Another issue is that SeqSort doesn't have an implementation for str.count, so I implemented it as a recursive function. It works for simple examples but freezes dead trying to prove when the given sequence is a z3.Const.

Another topic is how many things I patch to make arithmetic work as in Python (special thanks to Hypothesis for catching it all, BTW). For example:

# "rounds" to 0
-5 // 2
# -3

# rounds to the closest even
z3.solve(z3.Int('n') == z3.IntVal(-5) / z3.IntVal(3))
# [n = -2]

Why am I telling it all? Well, because I doubt my decision to 100% rely on z3 solving everything I feed in it. So, probably, the best approach is somewhere between CrossHair and deal, closer to CrossHair: try to be strict and convert as much as possible but at the same time not so heavily rely on z3.

pschanely commented 3 years ago

@orsinium everything you are saying here is VERY familiar to me!

Nonlinear arithmetic is not decidable, and indeed, you can pretty much forget about raising to a symbolic power. For this kind of problem (and for others that get too complex for Z3), we gradually fall back to supplying a mixture of concrete & symbolic values. (because taking a symbolic to a constant power sometimes does work!) (example CrossHair test case)

CrossHair does this for floordiv. Another tactic you'll see in there is the smt_fork() call - this specializes execution to one branch or the other. Essentially, adding assumptions along the way can help put less load on Z3 (at the cost of having to do more executions to cover all the possible cases).

Probably this should move to another discussion, but I am very curious about the kinds of use cases you're targeting, and whether CrossHair could be extended for those. I'd be more than willing to implement DEAL support for CrossHair if that meant that we could combine forces, for instance. There are also lighter ways that we could help each other, for instance, just by chatting more, sharing experiences, etc.

orsinium commented 3 years ago

When I started it, I had a dream to make a support for holes like in idris. For example:

@deal.ensure(lambda _: _.result > _.x)
def f(x):
  return ...

Here, ... is a hole. It is a place for code that we haven't written yet. And the smart machine can fill the hole, at least good enough for TDD. How to represent a hole in terms of z3? Well, it's just a function that accepts as arguments all variables in the current scope. How to solve it? Quantifiers! We know that the post-condition must be true for all input values. And so we get:

import z3
x = z3.Int('x')
f = z3.Function('f', z3.IntSort(), z3.IntSort())
z3.solve(z3.ForAll([x], f(x) > x))

And it just freezes. Why? Because z3 is bad at quantifiers. Really bad. Even worse than at powers. That's really sad. Now, I have a choice either to give up this idea for a while or implement it in hope that someday z3 will get much better.

That's interesting how it can't even do some simple optimizations for this structure:

# this one is solved because you just say the exact implementation for f
z3.solve(z3.ForAll([x], f(x) == x * 2))

# but this one is freezed to death ¯\_(ツ)_/¯
z3.solve(z3.ForAll([x], f(x)/2 == x)) 

UPD: the answer to the first hole is x+1, for example. I always thought that machines are better at math than me. Turned out, it's not always the case...

pschanely commented 3 years ago

Ah, I think this is called "program synthesis." I am not an expert, but I think the solutions here usually boil down to some kind of search, hopefully guided by the types. It's a bit of a holy grail.

I think dependent type systems generally restrict the language of the dependent types in ways that ensure verification of a potential solution is at least decidable. So they've got that leg up on us. :)

orsinium commented 3 years ago

That's the point, actually. I consider contracts as another way to "refine" types. Refined types, liquid types, dependant types, contracts, it's all the same: a way to narrow down the set of values the type includes. The only difference in syntax. And DbC beat everything else here, having the same syntax as the language itself. The only difference is that it is harder to statically analyze because of the bigger syntax. So, yeah, our way is harder but possibilities are the same.

I think the solutions here usually boil down to some kind of search, hopefully guided by the types.

I'm more not an expert 😉 and this description is how I understand the work of SAT solvers, so I see no reason why z3 shouldn't be able to handle it.

I have this video in my TODO list: A peek inside SAT solvers. The title sounds promising.

I dropped an issue in z3, BTW: https://github.com/Z3Prover/z3/issues/5019

pschanely commented 3 years ago

That's the point, actually

Yes. We want the same future. :)

I see no reason why z3 shouldn't be able to handle it.

I like your optimism! From Nikolaj's response, it sounds like Z3 could play a part in doing synthesis, but a lot of the search would have to happen outside.

One interesting avenue (for synthesis and potentially verification) that I haven't tried yet is an opcode-level model. In this case, you'd be asking Z3 to generate the python stack machine instructions to accomplish a task. (turning the "code" into regular data that Z3 can solve for) But I suspect that this ultimately just turns into a tree of ite()s that explodes after a few instructions, and only solves trivial problems. But it would be an interesting experiment regardless.

orsinium commented 3 years ago

What I understood from the answer on my issue above, is that I must provide a formula instead of a function. However, it still works really bad:

import z3
a,b,c,d,x,y = z3.Ints('a b c d x y')

z3.solve(z3.ForAll([x], x*a/b-c < x/2))
# failed to solve

z3.solve(z3.ForAll([x], x/b-c < x/2))
# [c = 2, b = 2]

Looks like quantifiers is a dead end. That's discouraging.

pschanely commented 3 years ago

Yeah, things go south quickly when you start using quantifiers. One minor point is that integer division is quite a bit harder to reason about than normal division. Your example over the reals, for instance works out ok:

>>> a,b,c,d,x,y = z3.Reals('a b c d x y')
>>> z3.solve(z3.ForAll([x], x*a/b-c < x/2))
[b = -1,
 a = -1/2,
 c = 1,
 /0 = [(1, -1) -> -1, (2, -1) -> -2, else -> 0]]

Anyway. I just saw this talk, and I think it's highly relevant to the sort of thing you want to do (I think): https://www.youtube.com/watch?v=uBthag5kOLw

I'd add a note that I think the fact that rosette wants an interpreter is sort of critical to how synthesis works - because the code supplied to the interpreter becomes data, which can then be solved for. Well, potentially: I suspect there's a lot of custom heuristics/search going on inside rosette.

Zac-HD commented 3 years ago

One minor point is that integer division is quite a bit harder to reason about than normal division [over the reals]

Ignoring, of course, that Python floats are discrete like integers rather than continuous like reals 😜

(we all do this, but "float != real" is sometimes important... for numerical stability, for example)

pschanely commented 3 years ago

Oh golly, don't get me started on this.

For most pythons, the correct sort to use for floats is z3.Float64(). Good luck trying to convert those to and from z3.IntSort() though!

I still haven't implemented the feature where CrossHair will attempt both approaches for floats, but it's high on the list.