secure-software-engineering / FlowDroid

FlowDroid Static Data Flow Tracker
GNU Lesser General Public License v2.1
1.02k stars 292 forks source link

Some issues regarding the call graph constructed by flowdroid #748

Open KyleLeith-007 opened 1 week ago

KyleLeith-007 commented 1 week ago

Dear developers,

I hope this message finds you well. Firstly, I would like to express my appreciation for your excellent work on the Soot-FlowDroid module. It has been instrumental in my recent analysis tasks.

I have encountered several challenges related to the call graph constructed by Flowdroid. Specifically, I am facing difficulties in utilizing dynamic analysis results to enhance the static analysis performed by Flowdroid.

My objective is to capture certain function call relationships through dynamic analysis and subsequently integrate these call edges into the call graph generated by Flowdroid, potentially enriching the information provided. To achieve this, I have developed a dynamic analyzer that captures the call stack of sensitive APIs invoked during the dynamic runtime of an APK. For instance, for a sensitive function SAPI, I obtained the following call stack: main_func1 -> main_func2 -> tpl-func3 -> SAPI (where 'main' denotes functions within the main program, and 'tpl' indicates functions within a third-party library).

My next step involves incorporating the call chain information into the call graph constructed by Flowdroid. Specifically, I aim to add all the call edges from the aforementioned call chain into the Flowdroid call graph. However, I have encountered an issue: many call edges captured dynamically do not appear in the call graph derived from static analysis, and numerous function nodes are missing from the call graph. I am particularly interested in understanding the reasons behind this discrepancy, especially for nodes that should be present in the call graph.

I seek guidance on how to dynamically add these missing nodes and edges to the Flowdroid call graph. Could you kindly advise me on the best approach to achieve this?

Thank you for your attention to this matter. I appreciate your efforts in developing and maintaining such a valuable tool.

Best regards

StevenArzt commented 1 week ago

This is indeed a highly interesting topic. We have worked on the integration of static and dynamic analysis in a framework built on top of Soot and FlowDroid as part of a recent paper: https://dl.acm.org/doi/pdf/10.1145/3589250.3596146.

By default, FlowDroid relies on the SPARK classgraph constructed by Soot, which performs an under-approximation of the CG. SPARK values precision over recall. You can change the FlowDroid configuration to use CHA instead if you want a coarse over-approximation.

SPARK works by propagating types from allocation sites to call sites. Take the following example:

List l = new ArrayList();
l.add(x);

In this example, the declared type at the call site for add is List. This type is an interface, i.e., there is no method implementation List.add(). CHA would now assume that the add methods in all concrete classes that implement List may be potential callees. SPARK, on the other hand, propagates the actual type ArrayList to the call site and restricts the callees to ArrayList.add().

While this approach is quite precise, it may lead to false negatives when the type is unknown, e.g., because the allocation site is in a factory method that is unavailable. For example, when obtaining an Android system service such as TelephonyManager, there is no allocation site in the codebase that is available to Soot. Hence, calls to methods such as getDeviceId() will be missing from the callgraph. FlowDroid deals with the incomplete CG using its StubDroid summaries. In other word, it doesn't matter much that thee CG is incomplete. If there is no callee for a call, but there is a data flow summary, the data flow tracking still works.

KyleLeith-007 commented 1 week ago

This is indeed a highly interesting topic. We have worked on the integration of static and dynamic analysis in a framework built on top of Soot and FlowDroid as part of a recent paper: https://dl.acm.org/doi/pdf/10.1145/3589250.3596146.

By default, FlowDroid relies on the SPARK classgraph constructed by Soot, which performs an under-approximation of the CG. SPARK values precision over recall. You can change the FlowDroid configuration to use CHA instead if you want a coarse over-approximation.

SPARK works by propagating types from allocation sites to call sites. Take the following example:

List l = new ArrayList();
l.add(x);

In this example, the declared type at the call site for add is List. This type is an interface, i.e., there is no method implementation List.add(). CHA would now assume that the add methods in all concrete classes that implement List may be potential callees. SPARK, on the other hand, propagates the actual type ArrayList to the call site and restricts the callees to ArrayList.add().

While this approach is quite precise, it may lead to false negatives when the type is unknown, e.g., because the allocation site is in a factory method that is unavailable. For example, when obtaining an Android system service such as TelephonyManager, there is no allocation site in the codebase that is available to Soot. Hence, calls to methods such as getDeviceId() will be missing from the callgraph. FlowDroid deals with the incomplete CG using its StubDroid summaries. In other word, it doesn't matter much that thee CG is incomplete. If there is no callee for a call, but there is a data flow summary, the data flow tracking still works.

Thank you very much for your response, which has greatly enhanced my understanding of the operating principles of FlowDroid. I will also diligently study the paper you provided in the link.

While awaiting your reply, I attempted to modify the parameter configuration of FlowDroid to alter the generated call graph. Based on the cgalgo parameter in the code, I tried using RTA mode to run FlowDroid (I avoided using CHA mode due to the excessive number of edges in the call graph constructed by CHA mode). However, I encountered an issue during this process. FlowDroid produced the following error when analyzing many APKs in RTA mode (though it completed successfully in SPARK mode). Especially when analyzing larger APKs, this issue is more likely to occur.

I am very confused about this issue, may it be due to insufficient memory? I would be extremely honored if I could receive your guidance on this issue(Some error logs are as follows):

[main] INFO soot.jimple.infoflow.android.SetupApplication$InPlaceInfoflow - Callgraph construction took 0 seconds [main] INFO soot.jimple.infoflow.codeOptimization.InterproceduralConstantValuePropagator - Removing side-effect free methods is disabled [main] INFO soot.jimple.infoflow.android.SetupApplication$InPlaceInfoflow - Dead code elimination took 2.6146866 seconds [main] INFO soot.jimple.infoflow.android.SetupApplication$InPlaceInfoflow - Callgraph has 85711 edges [main] INFO soot.jimple.infoflow.android.SetupApplication$InPlaceInfoflow - Starting Taint Analysis [main] INFO soot.jimple.infoflow.android.SetupApplication$InPlaceInfoflow - Using context- and flow-sensitive solver [main] INFO soot.jimple.infoflow.android.SetupApplication$InPlaceInfoflow - Using context- and flow-sensitive solver [main] WARN soot.jimple.infoflow.android.SetupApplication$InPlaceInfoflow - Running with limited join point abstractions can break context-sensitive path builders [main] INFO soot.jimple.infoflow.android.SetupApplication$InPlaceInfoflow - Looking for sources and sinks... [main] INFO soot.jimple.infoflow.android.SetupApplication$InPlaceInfoflow - Source lookup done, found 54 sources and 432 sinks. [FlowDroid] ERROR heros.solver.CountingThreadPoolExecutor - Worker thread execution failed: null java.lang.NullPointerException at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:906) at com.google.common.cache.LocalCache.get(LocalCache.java:4018) at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4042) at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:5024) at com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:5031) at soot.jimple.toolkits.ide.icfg.AbstractJimpleBasedICFG.getOrCreateUnitGraph(AbstractJimpleBasedICFG.java:130) at soot.jimple.toolkits.ide.icfg.AbstractJimpleBasedICFG.isExitStmt(AbstractJimpleBasedICFG.java:153) at soot.jimple.toolkits.ide.icfg.AbstractJimpleBasedICFG.isExitStmt(AbstractJimpleBasedICFG.java:51) at soot.jimple.infoflow.solver.cfg.InfoflowCFG.isExitStmt(InfoflowCFG.java:208) at soot.jimple.infoflow.solver.cfg.InfoflowCFG.isExitStmt(InfoflowCFG.java:1) at soot.jimple.infoflow.solver.fastSolver.IFDSSolver$PathEdgeProcessingTask.runInternal(IFDSSolver.java:750) at soot.jimple.infoflow.solver.fastSolver.LocalWorklistTask.run(LocalWorklistTask.java:27) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Exception in thread "FlowDroid" [main] INFO soot.jimple.infoflow.memory.MemoryWarningSystem - Shutting down the memory warning system... java.lang.NullPointerException at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:906) at com.google.common.cache.LocalCache.get(LocalCache.java:4018) at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4042) at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:5024) at com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:5031) at soot.jimple.toolkits.ide.icfg.AbstractJimpleBasedICFG.getOrCreateUnitGraph(AbstractJimpleBasedICFG.java:130) at soot.jimple.toolkits.ide.icfg.AbstractJimpleBasedICFG.isExitStmt(AbstractJimpleBasedICFG.java:153) at soot.jimple.toolkits.ide.icfg.AbstractJimpleBasedICFG.isExitStmt(AbstractJimpleBasedICFG.java:51) at soot.jimple.infoflow.solver.cfg.InfoflowCFG.isExitStmt(InfoflowCFG.java:208) at soot.jimple.infoflow.solver.cfg.InfoflowCFG.isExitStmt(InfoflowCFG.java:1) at soot.jimple.infoflow.solver.fastSolver.IFDSSolver$PathEdgeProcessingTask.runInternal(IFDSSolver.java:750) at soot.jimple.infoflow.solver.fastSolver.LocalWorklistTask.run(LocalWorklistTask.java:27) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) [main] ERROR soot.jimple.infoflow.android.SetupApplication$InPlaceInfoflow - Exception during data flow analysis java.lang.RuntimeException: There were exceptions during IFDS analysis. Exiting. at soot.jimple.infoflow.solver.fastSolver.IFDSSolver.runExecutorAndAwaitCompletion(IFDSSolver.java:263) at soot.jimple.infoflow.solver.fastSolver.IFDSSolver.awaitCompletionComputeValuesAndShutdown(IFDSSolver.java:230) at soot.jimple.infoflow.solver.fastSolver.IFDSSolver.solve(IFDSSolver.java:202) at soot.jimple.infoflow.AbstractInfoflow.runTaintAnalysis(AbstractInfoflow.java:958) at soot.jimple.infoflow.AbstractInfoflow.runAnalysis(AbstractInfoflow.java:654) at soot.jimple.infoflow.AbstractInfoflow.runAnalysis(AbstractInfoflow.java:576) at soot.jimple.infoflow.android.SetupApplication$InPlaceInfoflow.runAnalysis(SetupApplication.java:1369) at soot.jimple.infoflow.android.SetupApplication.processEntryPoint(SetupApplication.java:1677) at soot.jimple.infoflow.android.SetupApplication.runInfoflow(SetupApplication.java:1606) at soot.jimple.infoflow.android.SetupApplication.runInfoflow(SetupApplication.java:1553) at soot.jimple.infoflow.cmd.MainClass.run(MainClass.java:360) at soot.jimple.infoflow.cmd.MainClass.main(MainClass.java:257) Caused by: java.lang.NullPointerException at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:906) at com.google.common.cache.LocalCache.get(LocalCache.java:4018) at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4042) at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:5024) at com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:5031) at soot.jimple.toolkits.ide.icfg.AbstractJimpleBasedICFG.getOrCreateUnitGraph(AbstractJimpleBasedICFG.java:130) at soot.jimple.toolkits.ide.icfg.AbstractJimpleBasedICFG.isExitStmt(AbstractJimpleBasedICFG.java:153) at soot.jimple.toolkits.ide.icfg.AbstractJimpleBasedICFG.isExitStmt(AbstractJimpleBasedICFG.java:51) at soot.jimple.infoflow.solver.cfg.InfoflowCFG.isExitStmt(InfoflowCFG.java:208) at soot.jimple.infoflow.solver.cfg.InfoflowCFG.isExitStmt(InfoflowCFG.java:1) at soot.jimple.infoflow.solver.fastSolver.IFDSSolver$PathEdgeProcessingTask.runInternal(IFDSSolver.java:750) at soot.jimple.infoflow.solver.fastSolver.LocalWorklistTask.run(LocalWorklistTask.java:27) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) [main] INFO soot.jimple.infoflow.android.SetupApplication - Found 0 leaks from 0 sources

StevenArzt commented 1 week ago

CHA and RTA are less commonly used in FlowDroid du to the large number of false positives they produce in the data flow analysis. Your use case is special, so it makes sense to use RTA.

When FlowDroid starts, it creates a map between all Jimple units the Jimple bodies in which they are contained. When propagating taints, FlowDroid needs to check whether a given statement is an exit point for its respective method. This requires FlowDroid to obtain the method's control flow graph. In your call stack, the unit at hand is not part of this mapping between units and bodies, which is strange. In line 152 of AbstractJimpleBasedICFG, the body is null. You would need to debug this and check where this unit comes from and why it was not present when initializeUnitToOwner() was called during the initialization.