odpi / egeria

Egeria core
https://egeria-project.org
Apache License 2.0
806 stars 260 forks source link

CTS fails with java metaspace error (UBI-8 container image) #3740

Closed planetf1 closed 4 years ago

planetf1 commented 4 years ago

When testing Release 2.3, in k8s or compose (or indeed plain docker) the graph CTS will fail with:

Fri Oct 02 14:07:58 GMT 2020 CTS_Server Information CONFORMANCE-SUITE-0012 The Open Metadata Conformance Test Case repository-type-definition-event-396 is initializing; see https://egeria.odpi.org/open-metadata-conformance-suite/docs/repository-workbench/test-cases/repository-type-definition-event-test-case.md for documentation
Fri Oct 02 14:07:58 GMT 2020 CTS_Server Information CONFORMANCE-SUITE-0014 The Open Metadata Conformance Test Case repository-type-definition-event-396 has completed with 10 successful assertions, 0 unsuccessful assertions, 0 unexpected exceptions and 7 discovered properties.  The message on completion was: Type definition event successfully processed
Fri Oct 02 14:07:58 GMT 2020 CTS_Server Information CONFORMANCE-SUITE-0012 The Open Metadata Conformance Test Case repository-typedef-AttachedNoteLog-event-396 is initializing; see https://egeria.odpi.org/open-metadata-conformance-suite/docs/repository-workbench/test-cases/repository-typedef-test-case.md for documentation
Terminating due to java.lang.OutOfMemoryError: Metaspace

I didn't observe this with other usage of egeria or the inmem repository.

This does not occur when run locally.

The reason appears to be that the Redhat UBI-8 image (which we switched to for this release) sets the metaspace space explicitly - see https://access.redhat.com/solutions/2038983

the article specifies a whole lot of things that could be checked - in case of leaks, but notes that by default metaspace is not artificially limited by the jvm. And that apps with many classes may get an issue when restricted

One of the recommended actions, and the one I intend to take, is to remove the artificial metaspace limit. JVM utilization could be looked into in more depth, but at this point it seems hard to justify expending the resource

cc: @grahamwallis

planetf1 commented 4 years ago

The simple docs for the UBI-8 image for openjdk11 make NO mention of this setting

docker run registry.access.redhat.com/ubi8/openjdk-11 cat /help.md

The working value appears to be

-XX:MaxMetaspaceSize=unlimited

The image offers

Since it's not documented what is set in the former, it would seem adding to the latter makes sense - will have to see if it can OVERRIDE the presumably coded specific, smaller setting

planetf1 commented 4 years ago

Before:

jonesn:egeria/ (issue3740) $ kubectl logs lab-odpi-egeria-lab-dev-568cfff9dc-jlbxb | head                             [16:33:16]
 1 Dockerfile +                                                                                                                X
/usr/local/s2i/run: line 15: /opt/jboss/container/maven/default//scl-enable-maven: No such file or directory
Starting the Java application using /opt/jboss/container/java/run/run-java.sh ...
INFO exec  java -XX:+UseParallelOldGC -XX:MinHeapFreeRatio=10 -XX:MaxHeapFreeRatio=20 -XX:GCTimeRatio=4 -XX:AdaptiveSizePolicyWeight=90 -XX:MaxMetaspaceSize=100m -XX:+ExitOnOutOfMemoryError -cp "." -jar /deployments/server/server-chassis-spring-2.3.jar
 ODPi Egeria
    ____   __  ___ ___    ______   _____                                 ____   _         _     ___
   / __ \ /  |/  //   |  / ____/  / ___/ ___   ____ _   __ ___   ____   / _  \ / / __    / /  / _ /__   ____ _  _
  / / / // /|_/ // /| | / / __    \__ \ / _ \ / __/| | / // _ \ / __/  / /_/ // //   |  / _\ / /_ /  | /  _// || |
 / /_/ // /  / // ___ |/ /_/ /   ___/ //  __// /   | |/ //  __// /    /  __ // // /  \ / /_ /  _// / // /  / / / /
 \____//_/  /_//_/  |_|\____/   /____/ \___//_/    |___/ \___//_/    /_/    /_/ \__/\//___//_/   \__//_/  /_/ /_/

After (local build)

jonesn:egeria/ (issue3740) $ docker run docker.io/library/image                                                       [16:45:41]
/usr/local/s2i/run: line 15: /opt/jboss/container/maven/default//scl-enable-maven: No such file or directory
Starting the Java application using /opt/jboss/container/java/run/run-java.sh ...
INFO exec  java -XX:+UseParallelOldGC -XX:MinHeapFreeRatio=10 -XX:MaxHeapFreeRatio=20 -XX:GCTimeRatio=4 -XX:AdaptiveSizePolicyWeight=90 -XX:MaxMetaspaceSize=100m -XX:+ExitOnOutOfMemoryError -XX:MaxMetaspaceSize=1g -cp "." -jar /deployments/server/server-chassis-spring-2.3.jar
 ODPi Egeria
    ____   __  ___ ___    ______   _____                                 ____   _         _     ___
   / __ \ /  |/  //   |  / ____/  / ___/ ___   ____ _   __ ___   ____   / _  \ / / __    / /  / _ /__   ____ _  _
  / / / // /|_/ // /| | / / __    \__ \ / _ \ / __/| | / // _ \ / __/  / /_/ // //   |  / _\ / /_ /  | /  _// || |
 / /_/ // /  / // ___ |/ /_/ /   ___/ //  __// /   | |/ //  __// /    /  __ // // /  \ / /_ /  _// / // /  / / / /
 \____//_/  /_//_/  |_|\____/   /____/ \___//_/    |___/ \___//_/    /_/    /_/ \__/\//___//_/   \__//_/  /_/ /_/

 :: Powered by Spring Boot (v2.3.3.RELEASE) ::

^C%

Unfortunately I could not easily set the value to limited without changing many other settings that the redhat image sets up and was keen to make the minimum change. 0,-1 or 'unlimited' are invalid.

Recommendations online tend to either go with unset, or to a high value for a production guard. Redhat defaults to 100m. Many sites suggest several 100m, but could be more for complex app. Ours is so I am setting to 1g. Note this is a safety limit only.