Open fbeyond opened 7 years ago
Thanks for these insightful questions.
1) How accurate is it to construct the call graph? (I saw Android app is event driven and contains many asynchronous callbacks) I parse the smali-IR and identify "invoke" instructions. I build the graph based on these instruction ops (e.g., invoke.iget.iput). If you want a more precise graph, you can use the Soot call graph API[1] I think FlowDroid (based on Soot)[2] has some strategies on the asynchronous callbacks handling.
2) Did you realize data-flow analysis in your tool? I utilize the androwarn tool which provides some-degree data-flow analysis in Smali code.
[1]https://ssebuild.cased.de/nightly/soot/javadoc/index.html?soot/jimple/toolkits/callgraph/CallGraph.html [2]https://github.com/secure-software-engineering/soot-infoflow-android/wiki
If you are interesting in both machine learning and program analysis in Android malware detection, you could use Soot for a more accurate program analysis (e.g., extracting data-flow-related features) and apply these new features for machine learning. Most existing ML solutions extract static features with less semantic information in an app.
Best,
I am a newbie in Android malware detection. I saw many ML-based tools are implemented based on Smali-IR. And I am also looking into the Soot for a more comprehensive analysis.
I have some questions on this tool. 1) How accurate is it to construct the call graph? (I saw Android app is event driven and contains many asynchronous callbacks)
2) Did you realize data-flow analysis in your tool?