google / error-prone

Catch common Java mistakes as compile-time errors
https://errorprone.info
Apache License 2.0
6.87k stars 744 forks source link

Excessive Metaspace usage when Error Prone enabled #2793

Open hisener opened 2 years ago

hisener commented 2 years ago

As mentioned in https://github.com/google/error-prone/issues/2786#issuecomment-1003368760, we are planning to move away from forking javac. However, after disabling the fork option, we noticed that the metaspace (off-heap) usage increased significantly (220 MiB -> 1.8 GiB). Please have a look at the graphs below.

I don't have a reproduction case at the moment, but we don't have anything custom, only some command-line options to enable/disable some checks. Maybe it's worth mentioning; we also use NullAway; however, disabling it didn't change the metaspace usage.

Depending on the feedback, I am happy to help reproduce this in an open-source repository. Please let me know.


I run mvn -T 4 clean install -DskipTests and attached jconsole.

Error Prone disabled

Screenshot 2022-01-04 at 10 56 05 Screenshot 2022-01-04 at 10 55 56

Error Prone enabled

Screenshot 2022-01-04 at 11 06 53 Screenshot 2022-01-04 at 11 06 44
Stephan202 commented 2 years ago

(Ex-colleague of @hisener here :upside_down_face:.)

We hit a similar issue in the past with a large build running on Travis CI. I looked up the relevant commit and found this (commit is timestamped Fri Jun 7 16:53:34 2019 +0200):

 # The Travis CI build environment is memory constrained, so we configure
-# a decent cap and a garbage collector optimized for throughput.
-export MAVEN_OPTS="-verbose:gc -XX:+UseParallelGC -Xmx4g"
+# suitable caps and a garbage collector optimized for throughput. The
+# suitability of these exact parameters was established empirically:
+# - We use a serial rather than a parallel garbage collector because this seems
+#   to perform better in combination with `mvn -T 2` when running inside Travis
+#   CI's two-core build environment.
+# - When Error Prone is enabled, Maven builds with many submodules use a _lot_
+#   of metaspace. This appears to be related to the annotation processor class
+#   path being constructed anew for each module; for `picnic-platform` some
+#   classes are loaded hundreds of times in the metaspace. These are kept alive
+#   using soft references. By default the metaspace is unbounded, preventing
+#   the soft references from being garbage collected. We resolve this by
+#   setting an explicit upper bound; that way a subset of the loaded classes
+#   _is_ garbage collected.
+export MAVEN_OPTS="-verbose:gc -XX:+UseSerialGC -XX:MaxMetaspaceSize=3072m -Xmx3584m"

Later we moved to larger Travis CI machines, and on Mon Jul 13 16:42:42 2020 +0200 the relevant code was updated as follows:

-# The Travis CI build environment is memory constrained, so we configure
-# suitable caps and a garbage collector optimized for throughput. The
-# suitability of these exact parameters was established empirically:
-# - We use a serial rather than a parallel garbage collector because this seems
-#   to perform better in combination with `mvn -T 2` when running inside Travis
-#   CI's two-core build environment.
-# - When Error Prone is enabled, Maven builds with many submodules use a _lot_
-#   of metaspace. This appears to be related to the annotation processor class
-#   path being constructed anew for each module; for `picnic-platform` some
-#   classes are loaded hundreds of times in the metaspace. These are kept alive
-#   using soft references. By default the metaspace is unbounded, preventing
-#   the soft references from being garbage collected. We resolve this by
-#   setting an explicit upper bound; that way a subset of the loaded classes
-#   _is_ garbage collected.
-export MAVEN_OPTS="-verbose:gc -XX:+UseSerialGC -XX:MaxMetaspaceSize=3072m -Xmx3584m"
+# The following configuration properties were established empirically:
+# - The `ParallelGC` garbage collector is configured to optimize for
+#   throughput.
+# - The `Xmx` value is configured to utilize the majority of the available 15GB
+#   of memory, while leaving sufficient space for non-heap memory and other
+#   processes.
+# - The `ReservedCodeCacheSize` value is increased compared to its JDK 11
+#   default value of 240MB to in order to resolve a warning logged during
+#   compilation of large projects ("CodeHeap 'non-profiled nmethods' is full").
+export MAVEN_OPTS="-verbose:gc -XX:+UseParallelGC -Xmx9216m -XX:ReservedCodeCacheSize=512m"

The first analysis is ~2.5 years old, but perhaps it still applies to the latest Error Prone. (I checked: back then we also didn't fork javac.)