soot-oss / soot

Soot - A Java optimization framework
GNU Lesser General Public License v2.1
2.87k stars 706 forks source link

Symbolic execution for Java/Android code #54

Open malaverdiere opened 11 years ago

malaverdiere commented 11 years ago

(copy-pasted from GSOC application) Explanations: Symbolic execution is a program analysis technique that has many applications such as test-input generation, bug finding. The basic idea behind symbolic execution is as follows. The program is executed with symbolic values, instead of concrete values, as program inputs, and the values of program variables are represented as symbolic expressions of those inputs. At any point during symbolic execution, the state of a symbolically executed program includes the symbolic values of program variables at that point, a path constraint on the symbolic values to reach that point, and a program counter. The path constraint (PC) is a boolean formula over the symbolic inputs, which is an accumulation of the constraints that the inputs must satisfy for an execution to follow that path. At each branch point during symbolic execution, the PC is updated with constraints on the inputs such that (1) if the PC becomes unsatisfiable, the corresponding program path is infeasible, and symbolic execution does not continue further along that path and (2) if the PC is satisfiable, any solution of the PC is a program input that executes the corresponding path. The program counter identifies the next statement to be executed. In this project, the student is expected to extend the code base of ACTEve (http://code.google.com/p/acteve), which is a symbolic-execution tool for Android apps (available in Java .class format) and uses Soot. Note: There are many interesting research problems related to symbolic execution. So this project can be equally suitable for students interested in research publication.

Expected Results: A symbolic-execution tool for Java. Specifically, given a program input, the tool will (1) compute PC for the path that the program takes for that input, and (2) generate new inputs that drive the program along paths that are different than the path that the original input takes.

Knowledge prerequisite: Good Java programming skills; knowledge of symbolic execution in particular and program analysis in general.

scg03 commented 10 years ago

any on-going project to implement this enhancement?

ericbodden commented 10 years ago

Hi @scg03 We currently have a student integrating symbolic execution into Soot. It's almost finished. What exactly are you looking to do?

scg03 commented 10 years ago

Hi @ericbodden I'm looking for a symbolic execution framework for Jimple and Google leaded me to this post. Are you using some existing framework? My plan is to modify CPAChecker to support Jimple.

hahnrobert commented 10 years ago

Hi @scg03 I am the guy @ericbodden was talking about. Instead of using a framework, I thought about how to create path constraints and solve them with a SMT solver. Since explaining my approach in detail would probably go beyond the scope of a single comment I try to wrap it up and give you a brief summary and explanation. When you want to reason about a statement whether it can be reached or not, you have to find all possible paths to that statement and build for each path its corresponding constraints and solve them. To reduce the amount of computation time I start at the statement and go backwards in code. Whenever I encounter a new, unknown variable I introduce a new symbol which has no value itself and store the link between the variable and the symbol in a map. When you come across an assignment statement you have to indroduce yet again new symbols for each variable used in the right side of the assignment and update the symbol which belongs to the left side.

Put simply, you always introduce a new symbol when you do not know the value of a variable you stumble across. (note: not the concrete value but the symbolic one)

Since this may sound a little bit confusing and is kind of hard to express in written text I want to give you a short example:

Example
aVar = 0;
bVar = 42;
aVar = aVar + bVar - 1;
if (aVar == 42) anInteresstingStatement();

Assume we want to know whether anInteresstingStatement() can be reached or not, we start right there and go backwards.

statement symbol-variable map path constraints
anInteresstingStatement(); [ ] [ ]
if (aVar == 42) [ (a1) ] [ ( a1 == 42 ) ]
aVar = aVar + bVar - 1; [ (a2), (a1 = a2 + b1 - 1), (b1) ] [ ( a1 == 42 ) ]
bVar = 42; [ (a2), (a1 = a2 + b1 - 1), (b1 = 42) ] [ ( a1 == 42 ) ]
aVar = 0; [ (a2 = 0), (a1 = a2 + b1 - 1), (b1 = 42) ] [ ( a1 == 42 ) ]

Since we will only find a single path constraint [ ( a1 == 42 ) ] a SMT solver will easily solve: ( 0 + 42 - 1 == 42 ) is not satisfiable and thus the statement cannot be reached.

I am converting my path constraints to SMT Lib 2.0 which allows me to use a large amount of SMT Solver. You also have to care about other language constructs such as intra- and inter procedural calls, fields, array, lists, maps, objects, loops and many more.

I hope you get the idea of my approach and can start to work with this.

Best regards, Robert

sbachala2 commented 9 years ago

Hello,

When do you think the source code would be open to public?

Thanks, Shakthi

StevenArzt commented 9 years ago

@sbachala2 We are currently publishing the algorithms in the form of a paper. Afterwards, we can discuss how we can disclose the algorithms and/or the code.

sbachala2 commented 9 years ago

Thank you for the information.

grzesuav commented 9 years ago

Hi, I'm interested how look like time plan for this pr. @StevenArzt could you update status of yours publication and whether code will be disclosed ?

Thanks in advance, Grzesiek

myvyang commented 7 years ago

hi, is there any new message about this topic?