ibmruntimes / Semeru-Runtimes

Issue repo for all things IBM Semeru Runtimes
14 stars 3 forks source link

JVMSHRC840E Failed to start up the shared cache (ARM64) #85

Open leochr opened 1 month ago

leochr commented 1 month ago

Liberty images can't be built due to the following error. So far it's only seen with ARM64. Publishing images for new Liberty releases and refreshing existing images are blocked due to this (the automated pipeline builds all architectures and this failure kills the build).

Slack thread: https://ibm-cloud.slack.com/archives/C59HR9D5X/p1722264208889429

15:42:31.440 0x16600           j9shr.1186     < OSCachemmap::releaseHeaderWriteLock: Exiting as no-op due to read-only
15:42:31.440 0x16600           j9shr.849      < SH_OSCachemmap::attach: data address returned is 0000FFFF7A3000F0
15:42:31.440 0x16600           j9shr.1785     - Mismatch in composite cache osPageSize value. CompositeCache = 0000FFFF7A3000F0, _theca->osPageSize = 65536, _osPageSize = 4096
15:42:31.440 0x16600           j9shr.356      > CC exitWriteMutex PRE: Thread 0x0000000000016600 exiting writeMutex from CC startup
15:42:31.440 0x16600           j9shr.1192     < CC exitWriteMutex: Exiting as no-op due to read-only enabled
15:42:31.440 0x16600           j9shr.1039     < CC startup: Exiting with rc=-3
JVMSHRC840E Failed to start up the shared cache.
15:42:31.441 0x16600           j9shr.2307   * < CM startup: Failed to start up the shared cache
JVMJ9VM015W Initialization error for library j9shr29(11): JVMJ9VM009E J9VMDllMain failed
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.

Full build log: container_image_build_verbose.log

pshipton commented 1 month ago

@hangshao0 pls take a look.

hangshao0 commented 1 month ago

More discussion are in the original slack. Already replied there. There might be some recent change on the kernel (memory page size) used to build the Semeru image. But @jayasg12 can confirm.

hangshao0 commented 1 month ago

Providing a image on 4k memory page kernel should unblock Liberty.

jayasg12 commented 1 month ago

Hi @hangshao0 yes, the kernel page size on servers where semeru container images were built is 64KB. grep -ir pagesize /proc/self/smaps KernelPageSize: 64 kB MMUPageSize: 64 kB

getconf PAGESIZE 65536

I have raised an infra ticket to reset the page size : https://github.ibm.com/runtimes/infrastructure/issues/9662

leochr commented 1 month ago

@jayasg12 @hangshao0 I appreciate the focus and your help to resolve this. Checking to see if there is an update and timeline on when the changes to unblock Liberty builds will be available in Semeru images. Both our production images and the pre-release images (needed for SVT) are blocked due to this.

In case you need to prioritize which ones to update first, here are the UBI-based Semeru images that Liberty uses for all the architectures (amd64, ppc64le, s390x and arm64). Thank you.

icr.io/appcafe/ibm-semeru-runtimes:open-8-jdk-ubi
icr.io/appcafe/ibm-semeru-runtimes:open-11-jdk-ubi
icr.io/appcafe/ibm-semeru-runtimes:open-17-jdk-ubi
icr.io/appcafe/ibm-semeru-runtimes:open-21-jre-ubi9-minimal
hangshao0 commented 1 month ago

I will be looking into the JVM to see if we can change the OpenJ9 code to handle the change of memory page size. But the next OpenJ9 release on Java 8, 11, 17 and 21 will be in Oct.

Before that, providing a Semeru image on 4k memory page kernel should unlock things on Liberty, which @jayasg12 is looking at.

Not sure if setting "$OPENJ9_SCC" to false on ARM64 in https://github.com/OpenLiberty/ci.docker/blob/72bda285669d3b1d2cdad87b229ec37250a96094/releases/24.0.0.6/kernel-slim/Dockerfile.ubi.openjdk8#L137 help you work around this now.

leochr commented 1 month ago

Thank you @hangshao0.

Setting OPENJ9_SCC will impact the Liberty startup time, hence it's not an option.

wraschke commented 1 month ago

Java team, we are blocked from creating our Liberty 24.0.0.8 release images and our publishing date of next week is at risk. Can you please inform us what your plan is to make progress?

jayasg12 commented 1 month ago

@hangshao0 On PPCLE Kernel page size was always 64K, any idea why the issue is seen only on ARM64

KernelPageSize: 64 kB MMUPageSize: 64 kB

@leochr @wraschke Can you please re-confirm if this issue exists on PPCLE or not.

wraschke commented 1 month ago

@jayasg12 the issue is not in PPC.

jayasg12 commented 1 month ago

@jayasg12 @hangshao0 I appreciate the focus and your help to resolve this. Checking to see if there is an update and timeline on when the changes to unblock Liberty builds will be available in Semeru images. Both our production images and the pre-release images (needed for SVT) are blocked due to this.

In case you need to prioritize which ones to update first, here are the UBI-based Semeru images that Liberty uses for all the architectures (amd64, ppc64le, s390x and arm64). Thank you.

icr.io/appcafe/ibm-semeru-runtimes:open-8-jdk-ubi
icr.io/appcafe/ibm-semeru-runtimes:open-11-jdk-ubi
icr.io/appcafe/ibm-semeru-runtimes:open-17-jdk-ubi
icr.io/appcafe/ibm-semeru-runtimes:open-21-jre-ubi9-minimal

@leochr @wraschke We got new server for arm64 to build the container images and below images are re-built on aarm64 server with 4K kernel page, request to pick these images for testing. icr.io/appcafe/ibm-semeru-runtimes:open-8-jdk-ubi icr.io/appcafe/ibm-semeru-runtimes:open-11-jdk-ubi icr.io/appcafe/ibm-semeru-runtimes:open-17-jdk-ubi

We are seeing issues while building icr.io/appcafe/ibm-semeru-runtimes:open-21-jre-ubi9-minimal on new server , we are looking into it. I will update this ticket once the issue is fixed. Thanks !!

wraschke commented 1 month ago

Hi, @jayasg12 . It looks like we are making progress. Our container images using Java 8, 11, and 17 images have built successfully, without the JVMSHRC840E problem. However, we are still seeing our image on Java 21 fail with that error code. I'm sure that's due to the continuing problems you alluded to.

Thank you for your continuing attention on this and please do let us know as soon as possible when the open-21-jre-ubi9-minimal image building problem has been resolved so that I can retry my test.

jayasg12 commented 1 month ago

@wraschke All images are refreshed. Please keep us informed if there are any issues with recent publish. Thanks !!

wraschke commented 1 month ago

I've run four pipelines now and I have not seen the JVMSHRC840E failure or other from JRE during Liberty images building.

I still need to run other pipelines that will create our Liberty refreshed images that we publish on a weekly basis (we haven't been able to do that for two weeks), so I hope you can hold off on closing the issue until I've published those.

wraschke commented 2 weeks ago

You can close this issue as we're no longer seeing the error code.