secure-software-engineering / FlowDroid

FlowDroid Static Data Flow Tracker
GNU Lesser General Public License v2.1
1.06k stars 298 forks source link

Imprecise Jimple IR for android example #280

Open rareham opened 4 years ago

rareham commented 4 years ago

Ex: MainActivity { Runnable a = new Runnable() { public void run() {

} }

Runnable b = new Runnable() { public void run() {

} } onCreate() {

Handler handler = new Handler();

handler.post(a); handler.post(b); } }

//corresponding jimple representation for onCreate()

onCreate() {

Runnable $u1; Handler handler; ... $u1 = this.a; handler.post($u1); //1

$u1 = this.b; handler.post($u1); //2

}

Notes : at tags 1 and 2 the same local variable of the type is used. Why are no separate local variables generated.

If i mimic the control flow in a normal java program i see that two local variables of the types runnable are created.

Some analysis result in false positives due to this representation.

I wish to know if there is any specific option to be set or is this a bug in the dexpler transformation of the android byte code to jimple.

MarcMil commented 4 years ago

First of all, this is done in Soot, not FlowDroid. This is a perfectly valid Jimple representation. If your analysis is not capable of distinguishing these values, you might want to use Single Static Assignment form (Shimple), where every variable is only written once.

Note that your analysis would fail otherwise on

Object x;

Handler handler = new Handler();

x = a;
handler.post(x);
x = b;
handler.post(x);

Even if it were more ""precise"".

rareham commented 4 years ago

My question was regarding the IR ensuring flow sensitivity in the analysis to some extent. For the same program i get two different IR representations for android and java byte codes. Furthermore, the example snippet you suggested still creates an IR in simple that ensure flow sensitivity. thank you.

MarcMil commented 4 years ago

Dalvik bytecode looks different from JVM bytecode and there are different transformers involved involved in lifting the code to proper Jimple code. And what you got there is proper Jimple code, this is not a bug with Soot, you're just expecting things Soot isn't intended to do using Jimple.

If I understand you correctly, you want this instead of your original Jimple code:

Runnable $u1;
Runnable $u2;
Handler handler;
...
$u1 = this.a;
handler.post($u1); //1

$u2 = this.b;
handler.post($u2); //2

While this is proper Jimple bytecode, there is no reason not to reuse the existing variable. if you look at soot's history, it is meant to be a compiler optimization framework. As such, it would not make sense to introduce a new local variable ($u2) if it is not needed. The local variables of Jimple and the local variables of your source code do not need to have an exact correspondence. There is a Jimple option to achieve that, but you probably do not want that. Here's why: My example posted earlier shows that I can manually reuse one variable in the original source code. In this case, even if you want the Jimple local variables to correspond with the source code variables, you might be getting something which you don't want:

Runnable x;
Handler handler;
...
x = this.a;
handler.post(x); 

x = this.b;
handler.post(x);

Look at how this resembles your Jimple code in the original thread, just that $u1 is x in my case. If soot produced this output on my sample, would you still consider it imprecise? You would encounter the same issue as you currently are. You just cannot rely on the Jimple bytecode to have only one assignment per variable. Your analysis seems to have as a problem with multiple write statements to the same variable. If you want that property you'll need to use some sort of SSA form (Soot includes Shimple for that matter). Look into Shimple. If your analysis cannot handle two different write statements to the same variable, SSA solves your problem. Otherwise you might want to improve your analysis.

MarcMil commented 4 years ago

Addendum: The sole requirement of Jimple code is that it's semantics match the original program. And they do for the program in your original post. Soot may try to optimize the code while preserving these semantics, which may include aggregating local variables if possible. In some cases the optimization may "look worse", but nevertheless the semantics when running the Jimple program should be the same. If that were not the case, that would be a bug.

rareham commented 4 years ago

I was under the notion that Android and Java byte code fragments to Jimple IR will lead to exactly same representations. I agree that the program semantics are still sound and there is nothing wrong with the representation. Are there any specific program points where the IRs for android and java byte code diverge always. Thanks for connecting the bugs.

rareham commented 4 years ago

Furthermore, while going through the Flowdroid thesis the author mentions that the even when the pointer analysis was flow insensitive, it does not create a lot of imprecise points to sets at the program points due to the IR representation. But the IR for the snippet did not fall into the same line. So i was not sure if the IR shown was exactly as expected or i have been missing some soot or flowdroid analysis options.

MarcMil commented 4 years ago

Are there any specific program points where the IRs for android and java byte code diverge always.

I didn't have a look into this and as such I am not aware. I just know that there are different kind of transformers involved in Dex code since this code seems to be harder to lift to Jimple code. Maybe @StevenArzt has some insights in this. However, you should not rely on such differences as in the next version we might have different transformers, since having a sable conversion to Jimple statements is not the goal of soot, rather the semantics should stay the same.

Regarding the FlowDroid specifics, Steven can give you a good answer. In general, a flow-sensitive pointer analysis could still distinguish these two locations since you can ask for Points-To-Sets at specific locations in the code (i.e. the statement you are interested in). There might also be a transformer in soot which may try to separate these usages in different local variables, but I am not sure. Steven or Eric Bodden may have an idea on whether this is the case. The SSA form would ensure this separation, but I think this does not work for FlowDroid because of the phi functions.

rareham commented 4 years ago

@StevenArzt , please comment on the issue.