opalj / opal

https://www.opal-project.de
Other
51 stars 27 forks source link

Detected non-deterministic behaviors when using different algorithms to generate call graphs #198

Closed AnnabellaM closed 2 months ago

AnnabellaM commented 4 months ago

Hello OPAL team,

I have been using OPAL for an empirical study aimed at detecting non-deterministic behaviors in static analyzers. During the experiments, we observed some non-deterministic behaviors across multiple runs when using different algorithms to generate call graphs.

Experimental Setup:

Observations:

We detected non-deterministic results in one program: eclipse.jar. This non-determinism occurred with all four algorithms. Specifically, we found that certain edges in the call graphs were inconsistent across runs.

Example:

Algorithm: RTA Call Graphs from Two Runs: Call Graph 1 Call Graph 2

One observation is that when the below edge appears, several other edges will be missing in the same run: image image

Additional Data:

The attached xlsx file provides more details on these observations. It shows that when the red edge appears in one run, the black edges are missing in that run: OPAL_diff.xlsx.

Could you please provide any insights into the possible causes of this non-deterministic behavior? Any feedback would be greatly appreciated!

errt commented 4 months ago

Hello Annabella,

thank your for reporting this. We're investigating the issue, but were yet unable to reproduce it. Can you give us more information on your setup, including which version of OPAL you are using (master or develop, commit hash if necessary), the JDK version and whether/which parts of the JDK you analyze alongside eclipse.jar, how you compute the call graphs (which tool are you using, what are your command line arguments) and how you compared them?

I just fixed an incorrectly implemented double-checked locking pattern (#200) that could lead to duplicate VirtualDeclaredMethods being created, but that doesn't seem to be the root cause of your reported non-determinism. Nonetheless, you could check whether you can still reproduce it using the up-to-date develop branch.

AnnabellaM commented 4 months ago

Hi,

Thank you @errt so much for the prompt response! I'm happy to provide more details about my setup:

OPAL Version: de.opal-project version 5.0.0 from the Maven Central Repository JDK Version: 11.0.19 I've developed an interface to invoke the OPAL API and print out the call graph using CallGraphSerializer.writeCG(). You can find the interface code at this GitHub repository.

For comparison, we converted each call graph into a set of edges (caller->callee) and compared the sets. The OPAL_diff.xlsx file shows the edge differences detected in the call graphs generated using different algorithms.

Regarding the OPAL version, I used OPAL as a Maven dependency, and it seems that version 5.0.0 is the most up-to-date version available in the Maven repository.

errt commented 4 months ago

Thanks, with this I was able to reproduce the issue also on the current develop branch. I'm currently in the process of localizing and fixing it. It seems to stem from the reflection analysis.

errt commented 4 months ago

It turns out this is actually expected behavior, and there is even a warning issued about this:

[warn][project configuration] java.lang.Thread is defined by multiple class files:  
[warn][project configuration]   jar:file:[...]/eclipse.jar!/jar:data/eclipse.zip!/jar:eclipse/plugins/org.eclipse.jdt.core.tests.performance_3.1.2/full-source-R3_0.zip!/jar:org.eclipse.osgi/osgi/ee.foundation.jar!/java/lang/Thread.class and
[warn][project configuration]   jar:file:[...]/eclipse.jar!/jar:data/eclipse.zip!/jar:eclipse/plugins/org.eclipse.jdt.core.tests.performance_3.1.2/full-source-R3_0.zip!/jar:org.eclipse.osgi/osgi/ee.minimum.jar!/java/lang/Thread.class
[warn][project configuration]   keeping the first one.

(and several more like these) The size of the call graph then depends on which class file is selected to be kept, i.e., which jar file is parsed first. I don't think we will do anything about this. Having multiple conflicting class files on your class path is generally not a good idea.