soot-oss / soot

Soot - A Java optimization framework
GNU Lesser General Public License v2.1
2.86k stars 705 forks source link

Incomplete call graph generated by Soot with APK #439

Closed LazarusX closed 4 years ago

LazarusX commented 9 years ago

This issue has been posted on the Soot mailing list for I while but I haven't seen any update, so maybe GitHub is a better place to keep track of it.

First of all, I want to thank the Soot contributors for creating such a powerful framework for analysis and instrumentation.

Recently, I was leveraging Soot, Soot infoflow and Soot infoflow Android to analyze Android applications. However, I found that the call graph of an APK file generated by Soot is incomplete.

Here's the APK file: https://drive.google.com/file/d/0B0ceYAgUVEZbX3pNS1paS0ZQZ2s/view?usp=sharing

Here's my code of analyer: https://gist.github.com/LazarusX/abef8d1d678ef51b20a1 The this.app object in the code is basically of a type which is a slightly modified SetupApplication.

Here's the code snippet of CameraActivity.smali, which is obtained by reverse-engineering the APK: https://gist.github.com/LazarusX/7a3e987d15539b7cca2a

In the reachable methods generated by Soot, the onResume method is present, however, <com.noclicklabs.camera.CameraSurface void startReceivingLocationUpdates(android.location.LocationManager)> is not, which is actually invoked at Line 83.

Is there something wrong with my code or is this a bug in Soot?

Best Regards.

StevenArzt commented 9 years ago

I assume that you are missing the outgoing call edge at line 83. Your problem description text is not fully clear in that regard. In such a case, you can check whether the SPARK callgraph algorithm has a chance to build an edge: It must know the type of the base object on which the method is invoked. This, in turn, requires SPARK to propagate types along assignments starting at an allocation site, i.e., an assignment "x = new X()". If your base object is null or come out of a factory method inside the Android SDK, this type propagation fails and there will not be a call edge. In such a case, you need to manually handle the call using library abstractions. Alternatively, you could theoretically also analyze the Android framework code together with your app, but that's usually way too costly.

The first step would thus be to check whether there is a clear path from an allocation site to the call site in application code or not.

LazarusX commented 9 years ago

Hi Steven,

The original post was not so well formated due to Markdown rendering issue, but I've corrected that.

I've decompiled the whole APK to Java source code. The CameraActivity.java is uploaded to https://gist.github.com/LazarusX/964acdd3f56391cf2aa7. The CameraSurface object in Line 473 (Line 83 in Smali code) is initialized via findViewById in the onCreate method of CameraAcvity, which is an activity declared in the manifest file. As a result, the CameraSurface object is not null and does not come out of a factory method inside Android SDK, but startReceivingLocationUpdates is still not present in reachable methods.

LazarusX commented 9 years ago

Can I assume that if there's is a path from the dummy main method to a method A in the call graph, then method A must be present in Scene.v().getReachableMethods()?

StevenArzt commented 9 years ago

Yes, you can assume that. However, there must be a path in the terms of SPARK's propagation. If an object is returned from a call to findViewById(), this is just as if it had been created by a factory method inside the framework. If you are analyzing your app together with the stubbed platform JAR file from the Android SDK, there is no real implementation for findViewById(). Hence, SPARK will never see a constructor call and thus fails to propagate type information for the return value. Consequently, there will not be any outgoing call edges from call with that base object.

LazarusX commented 9 years ago

Hi Steven,

Yes, actually I am using the android.jar file, and maybe that's the issue. How can I make Soot analyze the Android framework code, or is there any method which is less costly? Thanks.

Best Regards

StevenArzt commented 9 years ago

If you want to analyze the full Android SDK implementation, you just need to provide a platform.jar file which is not stubbed out, but contains an actual implementation. Still, note that you will not get everything as the Android SDK relies on its native components and drivers for certain aspects such as inter-component communication and others. Still, you can give it a try. For some platforms, we have full JAR files available at https://github.com/Sable/android-platforms.

Analyzing the complete framework implementation is a quite costly undertaking. In FlowDroid, we therefore use the normal stub files, but provide an explicit library model for those calls that we cannot directly analyze. We accept the gaps in the callgraph, and manually take care of them inside the data flow analysis. When we encounter findViewById(), we for instance get the numeric id and then look up the respective user control from the layout XML files. This is something you would need anyways; even if you can analyze the implementation of findViewById(), your analysis will normally not be able to understand the complex semantics behind such a looking up identifiers in binary definition files (resources.arsc) and then mapping that to layout XML files.

LazarusX commented 9 years ago

So I suppose that you mean the Android JARs at https://github.com/Sable/android-platforms are different from those which come with Android SDK. I have switched to use the Android JARs at https://github.com/Sable/android-platforms, but the edge which I want is still found in the call graph.

My analysis is based on Soot Infoflow Android (a.k.a. FlowDroid?) and the code of entrypoint generation, callback calculation, UI mapping and other Android specific stuff is almost entirely borrowed from that of the SetupApplication class of Soot Infoflow Android (thanks!). I suppose that the numeric ID lookup process is included in SetupApplication. Can you elaborate more on the "explicit library model" you mentioned?

Thanks.

StevenArzt commented 9 years ago

This explicit library model is called "Taint Wrapper" in FlowDroid. The data flow engine provides an interface ITaintPropagationWrapper that gets asked "if taint abstraction x flows into call y, what taint will exist after this call returns?" for every method call inside the target APK. FlowDroid ships with a very simple taint wrapper called EasyTaintWrapper that applies rules such as "on a call x=a.foo, make x tainted if a was tainted before". These rules are sufficient for most library methods such as simple collections.

For the Android-specific aspects such as accessing library controls, we need more. In FlowDroid, we especially want to define values read from password fields in the UI as sources. Therefore, FlowDroid ships with a special ISourceSinkManager (the interface that defines what is a source and what is a sink) for Android, the AndroidSourceSinkManager. This class looks at every call to findViewById(), takes the value (the id) passed to that method and then checks to which kind of UI control this id refers. If it is a password field, he respective call to findViewById() is marked as a source.

LazarusX commented 9 years ago

I see. Thanks. I'll read the source code of EasyTaintWrapper and AndroidSourceSinkManager for inspiration.

By the way, I notice that https://github.com/Sable/android-platforms lacks platforms for recent Android SDK versions. What's the instruction of building one (and perhaps I can open a pull request to contribute). Thanks.

moonZHH commented 8 years ago

Hi Steven, I may have have a similar problem. I was leveraging soot, soot-infoflow and soot-infoflow-android to analyze Android applications. However, I found that the call graph of an APK file generated by Soot is incomplete. Followings are the jimple code and smali code of the function "com.xxx.yyy.MyService: void onCreate()"(https://gist.github.com/njupt-moon/c8ea2f6dd4877ef25557). As you can see, line 20 in MyService_part.smali file contains an invocation of method "android.telephony.TelephonyManager: java.lang.String getDeviceId()",line 22 in MyService_part.jimple file also contains an invocation of method "android.telephony.TelephonyManager: java.lang.String getDeviceId()". But in whole program call-graph, the edge between "com.xxx.yyy.MyService: void onCreate()" and "android.telephony.TelephonyManager: java.lang.String getDeviceId()" is missing, how can i solve this problem? Thanks.

pavanupb commented 4 years ago

@njupt-moon Please re-open if this is still relevant.