secure-software-engineering / FlowDroid

FlowDroid Static Data Flow Tracker
GNU Lesser General Public License v2.1
1.02k stars 292 forks source link

(Question) Getting precisely source to sink path (methods) with FlowDroid #692

Closed Alireza-Ardalani closed 4 months ago

Alireza-Ardalani commented 5 months ago

@StevenArzt I tried to find leakage (Source/Sink) of android application with FlowDroid library. I tried these two code that I found them in test case and other issues. =>

First code: SetupApplication setupApplication = new SetupApplication(jarString, apkfile); setupApplication.setTaintWrapper(EasyTaintWrapper.getDefault()); setupApplication.getConfig().setImplicitFlowMode(enableImplicitFlows ? AllImplicitFlows InfoflowConfiguration.ImplicitFlowMode.NoImplicitFlows); setupApplication.getConfig().setStaticFieldTrackingMode(enableStaticFields ? ContextFlowSensitive : InfoflowConfiguration.StaticFieldTrackingMode.None); setupApplication.getConfig().setFlowSensitiveAliasing(flowSensitiveAliasing); InfoflowResults Results = setupApplication.runInfoflow(sinkSourceFilePath); System.out.println(Results.size()); var results = Results.getResults(); for (var result : results){ System.out.println(result.getO2()); }

Second code: final InfoflowAndroidConfiguration config = new InfoflowAndroidConfiguration(); config.getAnalysisFileConfig().setTargetAPKFile(apkfile); config.getAnalysisFileConfig().setAndroidPlatformDir(jarString); config.getAnalysisFileConfig().setSourceSinkFile(sinkSourceFilePath); config.setCodeEliminationMode(NoCodeElimination); config.setCallgraphAlgorithm(InfoflowConfiguration.CallgraphAlgorithm.CHA); SetupApplication app = new SetupApplication(config); InfoflowResults results = app.runInfoflow(); for(var result: results.getResults()){ System.out.println(result.getO2()); }

Result of First code: $r2 = virtualinvoke $r1.<java.net.HttpURLConnection: java.io.InputStream getInputStream()>()

Result of second code: $r7 = interfaceinvoke $r3.<android.database.Cursor: java.lang.String getString(int)>($i0)

There is difference! for an APK.

I figured out that, FlowDroid has so many configuration, I think the cause of different result is related to configuration of First and Second code. I also asked another question in issue #684 related to methods between source to sink. Thank you for your solution, but I got "null" with both of my code.

If it possible please guidance me regarding my Question: Getting precisely source to sink path (methods) with FlowDroid for all source to sink occurrences that FlowDroid could catch them.

I appreciate your time and consideration.

StevenArzt commented 5 months ago

@timll I'm currently very short on time. Can you please answer this one?

timll commented 5 months ago

Result of First code: $r2 = virtualinvoke $r1.<java.net.HttpURLConnection: java.io.InputStream getInputStream()>()

Result of second code: $r7 = interfaceinvoke $r3.<android.database.Cursor: java.lang.String getString(int)>($i0)

@Alireza-Ardalani Could you please provide the relevant parts of the code you analyzed? While you have used vastly different configurations (CHA as a call graph is as imprecise as it can get but sound, while the default SPARK is precise but unsound), it is hard to find out the issue without the code.

Alireza-Ardalani commented 5 months ago

@timll Thank you for answering.

To clarify my issue, I will first express my needs and usage of FlowDroid: I want to use the FlowDroid tool (as library) to find that what sources in an Android application (apk) is leaking (leak through sinks). And I need all participating methods between source and sink ( I think it is possible according to Call Flow Graph)

I'm new to flowDroid ( But its features are amazing for me), and since it's so well known tool I'm sure I can find my needs with it.

I found the above code by checking different examples, just to get familiaraize with FlowDroid, and if there is any problem, please tell me.

If it is possible for you, give me a starting point.

I really appreciate your time and consideration.

timll commented 5 months ago

I found the above code by checking different examples, just to get familiaraize with FlowDroid, and if there is any problem, please tell me.

Regardless of the configuration, it should be consistent for simple flows. I would appreciate if you could send us the flows of the two abovementioned results. I'm currently not sure whether the cause is the taint wrapper or some bug.

What is the best and most accurate tool configuration?

We typically disable static field tracking unless necessary. Static Fields have a broad scope in your app (accessible from anywhere if public) and as soon as one gets tainted, FlowDroid needs to propagate this static field fact through the whole-program, which is quite costly. Then, we sometimes also disable exceptional flow tracking to further increase the scalability.

To be able to find flows through methods of the stdlib, you need a taint wrapper (short-cut rules for often used APIs). We exclusively use StubDroid (SummaryTaintWrapper, paper called "StubDroid: Automatic Inference of Precise Data-flow Summaries for the Android Framework"). There is also the EasyTaintWrapper, which uses heuristics instead of summaries, but is quite imprecise (the heuristic as well as no support for access paths).

Also, you definitely want to define a timeout for the data flow analysis, in rare cases FlowDroid might need hours or even days to complete. In our experience, data flows are often rather short than long and, at some point, FlowDroid wastes time with propagating spurious facts. For example, for my latest large-scale evaluation on real-world apps, I have set the data-flow timeout to 900s.

Alireza-Ardalani commented 5 months ago

@timll

I would appreciate if you could send us the flows of the two abovementioned results.

Maybe this question of mine is very simple, but I don't understand how I can get the flow, actually I think one of my requirements is to get the source to sink flows. I used these two codes and only saw the source/sink output, could you please help me?

The other configurations you have described are quite understandable, I will try to consider them to get the desired result.

The only idea that came to my mind is to implement BFS or DFS, given that I can get the source and sink, and CFG (call flow graph) is available, and get the path between them ( methods between source and sink). is there any other optimal solution?

last tip: Is there any way I can get source code of (APK) based on FlowDroid? That is, the FlowDroid tool reads the Dex files and previews them based on their source code. for example, when FlowDroid gives me the source/sink methods, I want to be get the body of that method in source code. Is such a thing possible?

Thank you very much!

timll commented 5 months ago

Maybe this question of mine is very simple, but I don't understand how I can get the flow, actually I think one of my requirements is to get the source to sink flows. I used these two codes and only saw the source/sink output, could you please help me?

I meant just the Java source code (or binary apk) you analyzed using FlowDroid.

The only idea that came to my mind is to implement BFS or DFS, given that I can get the source and sink, and CFG (call flow graph) is available, and get the path between them ( methods between source and sink). is there any other optimal solution?

You can just use config.getPathConfiguration().setPathReconstructionMode(PathReconstructionMode.Fast); to get all statements that influenced the flow (i.e. identities are not part of the path, and, as discussed in #576, it is likely not possible to include them). Then, call icfg.getMethodOf(stmt) on each statement of the path to find the method.

last tip: Is there any way I can get source code of (APK) based on FlowDroid? That is, the FlowDroid tool reads the Dex files and previews them based on their source code. for example, when FlowDroid gives me the source/sink methods, I want to be get the body of that method in source code. Is such a thing possible?

You could use your decompiler of your choice and just try to match the code with the Jimple IR. Soot also supports keeping the line numbers associated with Jimple statements. However, if you need a decompiler to view the code, I assume you won't have debug build with line numbers.

Alireza-Ardalani commented 4 months ago

@timll

Thank you for your assistant and guidance. I got your points and they work for my project.

I attached the source that I used and got a different result, in this comment.

Source.zip

timll commented 4 months ago

Hi,

on my setup, I strictly see more leaks with CHA (config 2) compared to SPARK (default), which is the expected results because CHA is quite imprecise. I also ran the app in a loop to check for possible races, but couldn't find one. So I assume this is purely caused by the choice of the call graph algorithm.