Open pedrolamarao opened 2 weeks ago
@dmitripivkine @tajila fyi, pls take a look.
Is this test failing with -Xint
too?
This is unlikely GC issue. Sounds as VM (or even JIT potentially). GC does not control set of the roots.
-Xint
did not change the result with semeru-17.0.9 for Windows 64 bits.
Let me give you code a little less "redacted" to consider.
The Cleaner is defined here.
public interface KeysProvider
{
Cleaner cleaner = Cleaner.create();
// etc.
}
The channel is registered with the Cleaner in this method:
public final class GrpcProvider implements KeysProvider
{
// other stuff
@Override
public Keys open (URI locator)
{
try
{
// lots of peculiar configuration parsing
final var builder = Grpc.newChannelBuilder(target,credentials);
builder.defaultLoadBalancingPolicy("round_robin");
builder.defaultServiceConfig(serviceConfig);
final var channel = builder.build();
final var service = new GrpcKeys( getLogger(GrpcKeys.class), newBlockingStub(channel) );
cleaner.register(service,channel::shutdownNow);
return service;
}
catch (Exception e)
{
throw new KeysException("not expected",e);
}
}
}
and the JUnit tests as above.
@babsingh Please take a look at this
@pedrolamarao Can you provide a micro-test that I can run locally to reproduce the failure? It's needed to generate artifacts for introspecting JVM behavior.
Below is the testcase that I derived from the above information. I am unable to reproduce the issue.
// Filename: Test2.java
import java.lang.ref.Cleaner;
import java.lang.ref.Reference;
public class Test2 {
public Object getObject() {
Cleaner cleaner = Cleaner.create();
var obj = new String("a new string");
Cleaner.Cleanable c = cleaner.register(obj, () -> { throw new RuntimeException("object cleaned"); } );
return obj;
}
public static void main(String[] args) throws Throwable {
Test2 test = new Test2();
final var obj = test.getObject();
System.out.println(obj);
/* Invoke GC. */
for (int i = 0; i < 100; i++) {
System.gc();
Thread.sleep(100);
}
System.out.println(obj);
}
}
Assuming that KeysException
is thrown when the test fails.
JVM option: -Xdump:java+system+heap:events=throw,filter=*/KeysException
will generate the required artifacts.
Here is the doc for -Xdump for troubleshooting.
@pedrolamarao
-Xdump
in case it is not feasible to develop a micro-test?java -version
output of the JDK being employed in the test environment?The test is new; the Semeru JDK was not recently updated.
openjdk version "17.0.9" 2023-10-17
IBM Semeru Runtime Open Edition 17.0.9.0 (build 17.0.9+9)
Eclipse OpenJ9 VM 17.0.9.0 (build openj9-0.41.0, JRE 17 Windows 11 amd64-64-Bit Compressed References 20231017_568 (JIT enabled, AOT enabled)
OpenJ9 - 461bf3c70
OMR - 5eee6ad9d
JCL - 3699725139c based on jdk-17.0.9+9)
I have attached the -Xdump
for gRPC's exception StatusRuntimeException
which seems to me the interesting time point. I have attached only the small text files. If you require the large binary files, please advise how to share them.
javacore.20241001.151523.11480.0003.txt javacore.20241001.151527.11480.0006.txt javacore.20241001.151528.11480.0009.txt
@pedrolamarao
Yes, it reproduces with version 17.0.12+7.
I have uploaded -Xdump
files to the upload link provided.
Is -Xdump
capable of breaking on arbitrary program points?
Could I set it to dump on a specific file/line location?
I may find time to try a minimal reproducer, but not soon.
Thanks for the core files. I will see if they pin-point the root cause.
-Xdump will need to be used in conjunction with jdb to produce core files at specific file/line locations. The below steps highlight this process.
Could I set it to dump on a specific file/line location?
You can set breakpoints through jdb.
Is -Xdump capable of breaking on arbitrary program points?
jdb
will need to be run with the following JVM option: export OPENJ9_JAVA_OPTIONS="-Xdump:java+system+heap:events=user"
.jdb
won't accept the above JVM option through the cmdline, and passing the JVM option through the OPENJ9_JAVA_OPTIONS
env var should act as a workaround.jdb
, you can send a user signal to the Java process (e.g. kill -3 <PID>
, SIGQUIT = 3
). @pedrolamarao Does the failure go away if we change the GC policy to -Xgcpolicy:nogc and -Xgcpolicy:optthruput? The default GC policy is gencon
. More GC policies are here: https://eclipse.dev/openj9/docs/xgcpolicy.
Test results with various gcpolicy
:
gencon
: failsbalanced
: failsoptavgpause
: failsoptthruput
: failsnogc
: passes@pedrolamarao Would it be possible for you to invoke System.gc()
both before and after ClientCalls.getUnchecked
(ClientCalls.java:249) and run the tests using Adoptium and Liberica JDKs? OpenJ9's garbage collection is known to be more aggressive compared to OpenJDK/Hotspot, particularly when it comes to object collection.
Since there might not be any active references to the object while the instance method is still executing, the object could potentially be eligible for collection by the GC. If manually invoking the garbage collector (via System.gc()
) with Adoptium and Liberica JDKs results in the same issue, it could indicate that the object is indeed unreachable.
However, if this issue does not occur with these JDKs, we may need to delve deeper into OpenJ9. The core files only show the final state and don't provide insight into how the JVM reached that state. Further investigation through more active debugging (e.g., logging) may be required, though this approach is typically more tedious and time-consuming. That said, having a micro test that consistently reproduces the issue would significantly ease the process and help narrow down the root cause more quickly.
4XESTACKTRACE at io/grpc/stub/ClientCalls.getUnchecked(ClientCalls.java:249) <--- Objects are UNREACHABLE
4XESTACKTRACE at io/grpc/stub/ClientCalls.blockingUnaryCall(ClientCalls.java:167)
4XESTACKTRACE at dev/prodist/keys/services/KeysGrpc$KeysBlockingStub.exists(KeysGrpc.java:879)
4XESTACKTRACE at dev/prodist/keys/grpc/GrpcKeys.exists(GrpcKeys.java:73)
4XESTACKTRACE at dev/prodist/keys/grpc/GrpcConfigurationTest.smoke(GrpcConfigurationTest.java:32)
We have a reproducer here: https://github.com/pedrolamarao/semeru-issue-93
With JAVA_HOME
set to this:
PS P:\dev\semeru-issue-93> P:\jdk-17.0.12+7\bin\java -version
openjdk version "17.0.12" 2024-07-16
IBM Semeru Runtime Open Edition 17.0.12.1 (build 17.0.12+7)
Eclipse OpenJ9 VM 17.0.12.1 (build openj9-0.46.1, JRE 17 Windows 11 amd64-64-Bit Compressed References 20240716_793 (JIT enabled, AOT enabled)
OpenJ9 - 4760d5d320
OMR - 840a9adba
JCL - e70fd1fd64a based on jdk-17.0.12+7)
running ./gradlew test
on this project fails.
@pedrolamarao I am getting the below errors. Am I missing steps or dependencies on my end?
Starting a Gradle Daemon, 1 incompatible and 1 stopped Daemons could not be reused, use --status for details
> Task :grpc:extractIncludeProto FAILED
FAILURE: Build failed with an exception.
* What went wrong:
Execution failed for task ':grpc:extractIncludeProto'.
> Could not resolve all files for configuration ':grpc:compileProtoPath'.
> Cannot resolve external dependency javax.annotation:javax.annotation-api:1.3.2 because no repositories are defined.
Required by:
project :grpc
> Cannot resolve external dependency com.google.protobuf:protobuf-java:4.26.1 because no repositories are defined.
Required by:
project :grpc
> Cannot resolve external dependency io.grpc:grpc-core:1.63.0 because no repositories are defined.
Required by:
project :grpc
> Cannot resolve external dependency io.grpc:grpc-protobuf:1.63.0 because no repositories are defined.
Required by:
project :grpc
> Cannot resolve external dependency io.grpc:grpc-services:1.63.0 because no repositories are defined.
Required by:
project :grpc
> Cannot resolve external dependency io.grpc:grpc-stub:1.63.0 because no repositories are defined.
Required by:
project :grpc
* Try:
> Run with --stacktrace option to get the stack trace.
> Run with --info or --debug option to get more log output.
> Run with --scan to get full insights.
> Get more help at https://help.gradle.org.
BUILD FAILED in 12s
3 actionable tasks: 3 executed
Sorry, repositories must be coming from environment configuration. I have added the necessary repository configuration to Gradle scripts, and validated in a separate environment.
We have a modular system with a service provider scheme.
One of our providers is gRPC-based. We use a Cleaner to shutdown the managed channel when the provided service is collected. This is how the provider builds the service:
In our unit testing, there is a test like this:
This test passes in Adoptium and Liberica 11 and 17 JDKs, but fails in Semeru 11 and 17 JDKs with a
Channel shutdownNow invoked
exception. Debugging confirms that this is happening because the Cleaner is calling the Runnable while the test is still running.If we refactor the test to something like this:
the test passes.
It seems that the object is being collected too soon.