soot-oss / soot

Soot - A Java optimization framework
GNU Lesser General Public License v2.1
2.86k stars 706 forks source link

Extreme delay in soot.jimple.toolkits.typing.fast.Typing when loading a method body #1053

Open jpstotz opened 5 years ago

jpstotz commented 5 years ago

When loading the app org.telegram.plus 4.9.1.4 (13760) Soot ends up in an endless loop (runs several hours without returning) in the method soot.jimple.toolkits.typing.fast.Typing.minimize(List<Typing> tgs, IHierarchy h). This method is pretty special as it contains ~1700 local variables in Jimple.

In detail this happens if I try to retrieve the body of the method <org.telegram.ui.ChatActivity: org.telegram.ui.ActionBar.ThemeDescription[] getThemeDescriptions()>. Processing this methods ends up that the minimize method is called with a tgs of 16384 elements - looks like Soot can't handle this corner case. Or is it simply because the implementation of minimize has a runtime of O(n^2) which means it has to perform 16384^2 = 268435456 compare operations?

As the whole class is not very well commented I don't have a clue of the purpose of this class and of the minimize method in detail. Therefore I can't fix this bug myself.

Any help would be appreciated.

jpstotz commented 5 years ago

I encountered a second app that triggers this problem: ch.financefox.app v1.9.36 (154)

In this app at least three methods are affected which take a very long time in the soot.jimple.toolkits.typing.fast.Typing.minimize(List<Typing> tgs, IHierarchy h) method.

jpstotz commented 5 years ago

I found out more details on the app ch.financefox.app:

The interesting part is that in difference to the app I mentioned in my first post (org.telegram.plus) in ch.financefox.app the delay problem only occurs when the option include_all == true (I currently have to use this option because of #1060). However in this case the problem occurs it is even worse as with the app org.telegram.plus I mentioned in the my first post:

Affected are 3 methods - 2 of them I was able to identify:

IMHO this indicates that in the Typing minimization method is a serious problem hidden. Especially because from my point of view there should be no difference whether the library classes are loaded fully or not.

As the code of Typing.java is almost 10 years old I assume that something else has changed that violates certain assumptions of the Typing class. Unfortunately the whole Typing code is nearly free of comments. The only clue I found is the commit comment by @ericbodden: https://github.com/Sable/soot/commit/97b20b5e1e06294e66ab37562cf96772c81f1423

new (faster and more precise) Type Assigner by Ben Bellamy (see OOPSLA '08)

Digging deeper I found the title of the paper: Efficient Local Type Inference - it seems to be this paper: https://dl.acm.org/citation.cfm?id=1449802 I have to see how to get this paper - hopefully it contains some details that can light up the darkness around the Typing class.

patricklam commented 5 years ago

Always good when one puts one's papers not beyond a paywall... There is the free version of Bellamy's work as an undergrad thesis that you can find on the internet. It probably contains most of the information in the oopsla version, or more.

On Sat, Nov 24, 2018, 9:06 AM Jan S. <notifications@github.com wrote:

I found out more details on the app ch.financefox.app:

The interesting part is that in difference to the app I mentioned in my first post (org.telegram.plus) in ch.financefox.app the delay problem only occurs when the option include_all == true (I currently have to use this option because of #1060 https://github.com/Sable/soot/issues/1060). However in this case the problem occurs it is even worse as with the app org.telegram.plus I mentioned in the my first post:

Affected are 3 methods - 2 of them I was able to identify:

  • <io.realm.a: java.lang.String toString()> - processing time ~ 2 hours
  • <io.realm.al: java.lang.String toString()> - processing time ~ 6 hours
  • third method (was'n able to identify it) - processing time more than 12 hours!

IMHO this indicates that in the Typing minimization method is a serious problem hidden. Especially because from my point of view there should be no difference whether the library classes are loaded fully or not.

As the code of Typing.java is almost 10 years old I assume that something else has changed that violates certain assumptions of the Typing class. Unfortunately the whole Typing code is nearly free of comments. The only clue I found is the commit comment by @ericbodden https://github.com/ericbodden: 97b20b5 https://github.com/Sable/soot/commit/97b20b5e1e06294e66ab37562cf96772c81f1423

new (faster and more precise) Type Assigner by Ben Bellamy (see OOPSLA '08)

Digging deeper I found the title of the paper: Efficient Local Type Inference - it seems to be this paper: https://dl.acm.org/citation.cfm?id=1449802 I have to see how to get this paper - hopefully it contains some details that can light up the darkness around the Typing class.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Sable/soot/issues/1053#issuecomment-441370301, or mute the thread https://github.com/notifications/unsubscribe-auth/AByidp2lfIroHJcG31jFSSlDXjSO1sqfks5uyVJbgaJpZM4YGDKC .

ericbodden commented 5 years ago

Hi all.

I know that I have encountered similar problems with individual - usually generated - methods in the past. I think the problem is intrinsic to the algorithms that Soot uses. IIRC they have cubic worst-case complexity, which can be large for some certain methods.

It would be great to have an implementation with smaller complexity!

jpstotz commented 5 years ago

I am still trying to understand how the type minimizing works in reality and why this performance problem arises. Therefore I was trying to find a simple example that demonstrates the problem. I found the a code snippet and simplified it a bit which helped me to understand what the problem is.

Note: If you want to try this example on your own you have to set the option include_all=true because otherwise the Comparable interface is not used by the type TypeResolver hierarchy.

    public String toString() {
        StringBuilder stringBuilder = new StringBuilder();;
        stringBuilder.append(getLong() != null ? getLong1() : "null");
        stringBuilder.append(getLong() != null ? getLong1() : "null");
        stringBuilder.append(getLong() != null ? getLong1() : "null");
        stringBuilder.append(getLong() != null ? getLong1() : "null");
        return stringBuilder.toString();
    }

Now what is the problem here? The problem is caused by the Elvis operator using different types for the true and false case: "null" is a String but getLong() returns a Long - and of course they are stored in the same local variable before
The problem is getting worse because String and Long have two Interfaces in common: Serializable and Comparable. Therefore for each line in this method the soot TypeResolver generates two potential Typings: one for Serializable and one for Comparable.

However this means that the number of potential Typing increases by two with every line! For this small example we already have 16 potential Typing. 4 more lines of this type and we already have to deal with 256 potential Typing!

Now we take into account that the minimize algorithm has cubic complexity which makes it obvious where the problem problem arises...

mbenz89 commented 5 years ago

Hi Jan.

Thanks for further investigating and illustrating the underlying problem! Do you have any ideas on improving the situation at hand?

jpstotz commented 5 years ago

On a very short term there is the possibility to parallelize the minimize algorithm if the number of typings to compare is large. Looking at the algorithm I assume one good point to parallelize would be the inner loop. However I have not checked this variant regarding it's performance, especially because usually method body retrieval is already done in parallel.

Other concrete options are difficult because I still don't understand how the Typing are in the end applied. However from an intuitive perspective the most problematic aspect is that the TypeResolver creates for every possibility a new potential typing. This makes sense for cases where variables are heavily cross-connected however in my example the variables are very focused on a small part of the code. Therefore solving them locally instead of "globally" (whole method level) could reduce the number of generated typing. But implementing such a change is currently out of scope for me as my understanding of the overall process is still incomplete.

In general IMHO one general problem of the implemented algorithm is that it bases on set theory. Set theory is intuitive but is difficult to implement efficiently. The whole situation for resolving the variable types reminds me on how to implement a chess AI (decision tree + evaluation function). For a chess game good algorithms exists, however I am not sure if we can guarantee to find the best result like the current algorithm claims to do.

mbenz89 commented 5 years ago

Thanks for letting us know your thought on the problem! Using a chess-like algorithm sounds interesting, while I currently can't quite figure out how that would solve the problem out of my head.

Let us know if you are further investigating the issue! For now, unfortunately, this seems to be a bit out of scope for us.

jpstotz commented 5 years ago

Hi Manuel,

I think I expressed it a bit complicated and a chess game tree was only an example for the structure.

Think of the typing problem like as a game with only one player.

At the beginning we have a method with a set of variables and each variable has a type. This is our game board and the variable typings are the pawns. Changing one type of a variable is a movement in our game. If we have multiple possible types for one variable we have multiple movements of our pawn. This builds our game/decision tree.

Now Soot come into play, however it does not build up a tree, instead it builds every combination of every possibly type of every variable. Looking at our game tree soot builds a set for each possible path within our tree. Of course this explodes with exponential complexity.

Now we use a game tree instead. If we make a decision on one of the first levels it only have to be made once where as soot has to make this type decision (and therefore the check which one to use) up to n^2 times. In the end we have to test several paths within our tree, however even in the worst case we don't have to perform as many operations as soot currently does.

mbenz89 commented 5 years ago

Wow, thanks for the explanation! I like the idea. How would an evaluation function for the typing problem look like, i.e., how do you decide what a good typing is in a specific tree level? How does the current algorithm decide that a typing is optimal?

jpstotz commented 5 years ago

Today I made significant progress on this issue. I added code for automatic conversion of the List<Typing> into a tree data structure - I think is is much better than trying to change the code that is generating the typings as this code is very complex. The comparison code is not yet completely finished however if I have not missed an important case in my implementation the new tree based minimizer is way faster that the old approach with n^3 runtime. The only minor drawback is that the initial time for building the tree may be slightly larger for cases where List<Typing> contains only a few typings. But it seems to be just in the range of milliseconds to may be very few seconds per method in case there are multiple Typing (which AFAIK is usually only the case for a small percentage of methods in a app/program).

On the other hand the saved time is really extreme - building the tree runs in n^2 and comparing the different local types should be even below - n * log(n) if I get it correctly. So even in worst case scenarios like the one described in this issue my TypingTree requires less than a minute instead of 15-150 minutes.

But implementing the TypingTreeis only one half - the other is to write some useful test cases. I checked the existing Soot unit tests, however it seems that for the Typing class there are no unit tests implemented. I already wrote some very simple test cases for Typing however for more complex unit tests my approach is not very useful. Does anybody (@mbenz89 @StevenArzt) know existing unit tests implementations that could be reused for a more complex unit tests that checks e.g. Jimpel body creation and therefore also implicitly the Typing minimization?

mbenz89 commented 5 years ago

We still have some test classes and cases that are not used (foremost because I don't know how they work and did not find any time to incorporate them into Maven's test phase yet): https://github.com/Sable/soot/tree/master/src/it

These are pretty small and I don't know if they would be of much help for testing your typing algorithm.

You could use the test framework I've set up for testing Soot as a whole with provided source class files: https://github.com/Sable/soot/blob/master/src/systemTest/java/soot/testing/framework/AbstractTestingFramework.java

Generally, it might make sense to take some open-source Java corpus, compile and run the code and then compare it's output to a version that has been read-in and written-out again by Soot.

jpstotz commented 5 years ago

@mbenz89 Thanks for the links. The AbstractTestingFramework.java looks interesting.

However I am not sure if I really code for read source code as test input. My current test cases directly test the Typing minimization, therefore as input I only need a list of Typing classes with certain locals and potential assigned types and a class hierarchy. As the minimization uses has the IHierarchy parameter it is even possible to provide a synthetic class hierarchy.

At the moment a student is working on implementing some more test cases. Based on an run on 2000 apps I have identified those that contain methods where the minimization is used (methods that have more than one Typing). Based on those apps the plan is to select some of the apps, identify the methods, check the generated Typings and build a synthetic test case that has the same Typings and an identical class hierarchy regarding the classes that are used in the method.

Regarding my TypingTree algorithm I had to realize that there are some more cases than I initially were aware of. Hopefully I will find some time to proceed with my TypingTree implementation.

mbenz89 commented 5 years ago

Great! Let us know when you make progress :)

jpstotz commented 5 months ago

The last days I was working again on improving the typing minimization performance. Unfortunately I am not capable of designing a new algorithm that is faster than O(n2), but I noticed some room for improvements in the existing minimize algorithm:

In recent version of Soot the affected method is soot.jimple.toolkits.typing.fast.DefaultTypingStrategy.minimize. As argument it gets List<Typing> tgs.

Effectively it uses tgs to build a matrix where the columns/rows represent the items of the tgs list. For every combination of elements in tgs we execute the compare method which tells us if the two selected Typing are a more general typing of the other. This means the matrix we build comprises of the results of compare method, and we always build the full matrix (except from the determinant (red line in the picture).

image

Based on my understanding the following is true for the compare method:

if (compare(a,b) == 1) then 
    compare(b,a) == -1

That means our matrix is mirrored on the determinant (considering the results 0, 1, -1, other of compare). The current implementation only checks if the result of compare is 1, or not 1. Also considering -1 as a result would allow us to ignore the upper right triangle part of the matrix and only compute the lower left.

I am currently changing the minimize implementing to implement this. As this is still not enough for certain programs to be analyzed I am working on a parallelized version of minimize that could be used cases where tgs exceeds a certain size (in my tests I have chosen a size of 1000 to be the boundary).

ericbodden commented 5 months ago

Thanks a lot for the effort! It's been too long since I last looked at the implementation, which is why I cannot confirm right now that what you are proposing will be correct. But it certainly might be. I think we'd be happy to test it out!

StevenArzt commented 5 months ago

I have merged the improved algorithm and there have been no issues with the FlowDroid test cases (which are more than 1,000 now). We have not seen any issues with our commercial code scanner and its massive test suite either, despite this test suite containing rather complex real-world applications. That makes me rather confident that this change didn't break something obvious.

ericbodden commented 5 months ago

Great, thanks a lot @StevenArzt !