secure-software-engineering / FlowDroid

FlowDroid Static Data Flow Tracker
GNU Lesser General Public License v2.1
1.02k stars 293 forks source link

Wondering the Relationship between CallGraph and SourceToSInk #556

Open russwestbrick opened 1 year ago

russwestbrick commented 1 year ago

Hello Arzt:

I use runInfoflow() to perform taint analysis which also generates a call graph. I want to add a special edge from source method to sink method in the call graph for further malware detection task. Here's an example of what I get.

$r4 = virtualinvoke $r3.<android.telephony.TelephonyManager: java.lang.String getDeviceId()>() -> $r5 = virtualinvoke $r19.<java.lang.String: java.lang.String replace(java.lang.CharSequence,java.lang.CharSequence)>("UDIDPHONE", $r5),

But many source and sink methods like android.telephony.TelephonyManager.getDeviceId don't even appear in the call graph. I want to know the reason behind this. Also I want to know how to understand the following sentence:

The sink <...> in method <...> was called with values from the following sources: - <...>() in method <...>

What does the word 'in' mean?

Thank You!

StevenArzt commented 1 year ago

In the output, we report the sink statement together with the method that contains the sink statement. That's the semantics of in.

The given method will not appear in the callgraph, because the base object is of class TelephonyManager and the instance comes out of a factory from inside the Android SDK. Therefore, the SPARK callgraph algorithm cannot propgate the base object type and there is no outgoing edge. FlowDroid handles such cases with the declared method reference, if there is no callgraph edge.

russwestbrick commented 1 year ago

Thank you for replies!

getDeviceID() is an API from android system, so it won't be a caller to call any other methods in the call graph. That's my understanding for "there is no outgoing edge"

A self-defined method uses/calls API getDeviceID() and the word in could be replaced by word called by. That my understand for "sink statement"

If there is a self-defined method called getDeviceID(), why getDeviceID() won't appear in the call graph, I can't understand "handles such cases with the declared method reference" very well

StevenArzt commented 1 year ago

That was a misunderstanding. There is a call site in your code where youu invoke getDeviceId. There is no outgoing edge from this call site to the getDeviceId method since the base object is the return value of a factory method for which we don't have real source code (the platform JARs in the Android SDK are merely stubs).

If there is no outgoing edge for a call site, you can still look at the declared callee, i.e., InvokeExpr.getMethod() in Soot. You don't need to handle virtual dispatch anyway for Android SDK methods.

russwestbrick commented 1 year ago

I find some explainations about factory method:

The Factory Method pattern suggests that you replace direct object construction calls (using the new operator) with calls to a special factory method. Don’t worry: the objects are still created via the new operator, but it’s being called from within the factory method. Objects returned by a factory method are often referred to as products.

TelephonyManager tm = new TelepthonyManager() tm.getDeviceID() would cause an edge in the call graph,

String s = TelephonyManager.getDeviceID().toString() won't.

Do I understand that right?

Thank You!

StevenArzt commented 1 year ago

Correct.

However, you can't directly call the constructor for the TelephonyManager class in Android, it's only accessible via the factory method.

russwestbrick commented 1 year ago

Thank You Arzt!

I have another question about the word in.

The call graph's format is like this: method_a in method_b ==> method_c. I find a lot of malware detection system parse that into method_b ==> method_c.

I wonder whether the word in could also be replaced by calls, and the reason use method_a in method_b ==> method_c rather than method_b ==> method_a method_a ==> method_c

StevenArzt commented 1 year ago

You're confusing data flow with control flow. FlowDroid reports data flows, so you have a sink statement inside a method, where the statement is the sink. That's the word "in". Statement A is located in method B and that statement A is the sink.

russwestbrick commented 1 year ago

So method_a in method_b ==> method_c could be parsed like sink statement method_a locates in method_b method_b calls method_c.

I can't find TelephonyManager.getDeviceID() anywhere in the call graph's txt file because it's a source statement and call graph's txt file only contains sink statement.

StevenArzt commented 1 year ago

The call graph is independent from the data flow analysis. Whether there is an incoming edge to a callee in the callgraph has nothing to do with whether this method is declared as a source or a sink for the data flow analysis.

russwestbrick commented 1 year ago

Thank You Arzt!

I gave up the thought of adding a special leak edge to the call graph. Maybe I will treat the result of taint analysis as additional features for my malware detection task.

Thanks again for your patience!