spotify / missinglink

Build time tool for detecting link problems in java projects
Apache License 2.0
146 stars 27 forks source link

optimizations around memory allocation and footprint #24

Closed mattnworb closed 9 years ago

mattnworb commented 9 years ago

I ran missinglink through Java Flight Recorder to get profiling info about how many objects are being created during the analysis to see what improvements could be made.

The test project I ran missinglink against while profiling has a dependency graph of about 51k classes. JFR showed that missinglink is creating millions of ClassTypeDescriptor instances and hundreds of thousands MethodDescriptors when analyzing this project.

Most of the memory allocation pressure is coming from char[] and String instances, which makes some sense since asm is returning type and method info as Strings and missinglink parses these into objects to work with.

A snapshot of JFR from the initial state showing that char[] is nearly half of the allocation pressure:

image

By caching the ClassTypeDescriptor instances (which is safe as they are immutable), we can shave off some of the String and char[] instances (about 0.5gb for char[]) - a lot of the String instances being created were from calls to String.replace(char, char) in ClassTypeDescriptor's constructor:

image

By caching the MethodDescriptor instances, the pressure is cut down even more significantly:

image

this larger improvement is a result avoiding some of the work done in MethodDescriptor that calls the asm methods Type.getMethodType(desc), Type.getDescriptor() etc. These methods do a lot of string parsing themselves, constructing Types from char arrays and then creating StringBuffers from a Type.

In the initial state, the Java Flight Recording reports Memory Allocated for TLABs: 5.51 GB during one run of the maven plugin with a total GC pause time of 20 s 957 ms.

With the caching changes (at 0c0243e), JFR reports that Memory Allocated for TLABs: 2.56 GB and a total GC pause time of 6 s 912 ms.

So the results of these two changes is that allocation pressure is down by roughly half and the GC time by 2/3rds.

This PR also includes some initial setup work for a general "benchmarks" module using JMH for microbenchmarks. For the improvements in caching ClassTypeDescriptor and MethodDescriptor this module and JMH ended up not being needed, but it seems like it may be useful to have this setup in the future.

The capture_flight_recording.sh script was used to ease creating the JFRs, my workflow here was to run ./install_maven_plugin.sh with my changes and then profile them with:

./benchmarks/capture_flight_recording.sh ~/Desktop/test-0c0243e.jfr ~/code/the-test-project/
pettermahlen commented 9 years ago

Nice! I think it looks great. I wonder if we could gain further benefits from GC tuning - @spkrka's comment made me wonder if we could for instance gain something from ensuring we get the cheapest possible young generation collections - very high allocation pressure will lead to objects being copied between survivor spaces and probably also promoted too soon. This change should reduce the allocation pressure, but we could maybe do more to ensure that objects actually die before they get unnecessarily copied or promoted.

mattnworb commented 9 years ago

@pettermahlen I haven't looked into it but I am not sure that a maven plugin can influence the GC settings of Maven at all - other than instructing users on what MAVEN_OPTS settings to tweak, since the plugin executes within the same JVM as what mvn launched. A "forked execution" of a plugin might be different but I think that would then increase the overall time of the build as some phases would be run multiple times.

pettermahlen commented 9 years ago

I was thinking it might be possible to control jvm args in a forked execution. At least, the jetty plugin does that. Could be something to explore.