Open MarcoBalossini opened 1 year ago
Have you tried assigning more memory to FlowDroid using the -Xmx
parameter? It might also help to specify reasonable timeouts for callgraph construction and data flow analysis. Some apps are huge and the analysis may consume a lot of resources and a lot of time.
Thanks, I'll try. Usually, on the average application, how much time does the analysis take?
We usually have five minutes of callback analysis timeout and, depending on the number of apps we want to analyze, we have around 10 minutes of data flow time. Individual apps may still be much faster, but for large apps, we set timeouts around these numbers.
In the past days I did some testing and came up with the following data, while using DEFAULT callback analyzer:
(The apk size is not a very meaningful measure unit, but I think it can give a general idea)
With FAST callback analyzer the time for the latter application was reduced to less than 15 minutes... Is it possible to have all this difference? What's the difference in precision?
Moreover this shouldn't be due to lack of memory, since I had that problem in some trial, and I obtained an OutOfMemoryException
.
There is some connection between runtime and available memory. When the JVM runs short on memory, it will conduct more garbage collector cycles which will pause the normal program flow to regain memory. The normal program flow can continue once enough memory has been regained for the allocation at hand.
In some cases, this behavior will lead to practically infinite runtime, because the GC manages to recover just enough memory to continue with a minimal step in program execution, before the next GC run is required, which in turn will be just enough to continue for yet another minimal step, etc. The analysis never really stops, but never really finishes either. You don't get an OutOfMemoryError
either, since you never arrive at a stage where a memory request can't be fulfilled anymore. I have seen this behavior quite a few times and I have seen people tweak the JVM's GC options to fail earlier and avoid GC'ing forever.
The app's code size isn't a good measure for required time or memory. In a large application, only few taint abstractions might be necessary due to luck, and in a small application, everything might get tainted. As rules of thumb, an analysis is the more expensive the more (instances of) sources you have, and the more objects are in scope. In other words, if an app carries around a reference to the main activity or some other "god class", this can become a problem.
The fast analyzer was an experiment. The default analyzer tries to connect all callbacks with their respective parent activity. The onClick
method of a button, for example, is only invoked in the running phase of the respective component inside the dummy main method. This requires iteratively broadening the set of reachable methods by transitively integrating all callback registrations starting at the lifecycle methods of each component.
With the fast analyzer, we gave up on all of that and simply built a flat list of all callbacks, regardless of where they are registered. This is obviously fast, but highly imprecise, because the link between callback and parent component gets lost. Each callback is associated with each activity. For most apps, we have seen that this over-approximation does not scale during data flow analysis. Regardless of how much time we saved during callback analysis, we lost way more than that during taint analysis.
It might be interesting to re-visit this question. If you can find a good a-priori approximation to decide which analyzer is the fastest, that would be helpful. We actually never looked into how much the fast analysis affects the precision, because the scalability issue ruled it out for us. If you want to re-visit the topic, that would also be something to look into.
If you can provide the APK and your sources/sinks file, I can try the 15MB app on one of our large compute servers and compare default/fast analysis there.
Many thanks for the explaination.
I tried to execute FlowDroid on Dott application (micromobility app) from PlayStore, using the SourcesAndSinks.txt
file provided in this repo.
The app has multiple dexes, and I wanted to analyze all of them together.
Question
Hi, I'm running Flowdroid cli to execute taint analysis on some apk files (e.g. Bird application for e-scooters), but the execution keeps going and printing things like
The problem is that I'm way past 10 execution hours, but both CPU and RAM are not fully employed
Is this timing normal or there's something wrong?
Environment
CPU: Intel i5 8th gen. RAM: 12 GB - 2400MHz
OS: Windows 10 Education 22H2 Java: 17.0.2 Android SDK: 30 Flowdroid: 2.10 - downloaded as jar with dependencies from GitHub releases