ririhedou / dr_droid

Analysis of code structure for malware classification with machine learning
Apache License 2.0
31 stars 15 forks source link

questions about the tool #3

Open fbeyond opened 7 years ago

fbeyond commented 7 years ago

I am a newbie in Android malware detection. I saw many ML-based tools are implemented based on Smali-IR. And I am also looking into the Soot for a more comprehensive analysis.

I have some questions on this tool. 1) How accurate is it to construct the call graph? (I saw Android app is event driven and contains many asynchronous callbacks)

2) Did you realize data-flow analysis in your tool?

ririhedou commented 7 years ago

Thanks for these insightful questions.

1) How accurate is it to construct the call graph? (I saw Android app is event driven and contains many asynchronous callbacks) I parse the smali-IR and identify "invoke" instructions. I build the graph based on these instruction ops (e.g., invoke.iget.iput). If you want a more precise graph, you can use the Soot call graph API[1] I think FlowDroid (based on Soot)[2] has some strategies on the asynchronous callbacks handling.

2) Did you realize data-flow analysis in your tool? I utilize the androwarn tool which provides some-degree data-flow analysis in Smali code.

[1]https://ssebuild.cased.de/nightly/soot/javadoc/index.html?soot/jimple/toolkits/callgraph/CallGraph.html [2]https://github.com/secure-software-engineering/soot-infoflow-android/wiki

If you are interesting in both machine learning and program analysis in Android malware detection, you could use Soot for a more accurate program analysis (e.g., extracting data-flow-related features) and apply these new features for machine learning. Most existing ML solutions extract static features with less semantic information in an app.

Best,