omniosorg / omnios-build

Build system for OmniOS
https://omnios.org
Other
89 stars 94 forks source link

OpenJDK 21 has problems with child processes #3575

Open jclulow opened 5 months ago

jclulow commented 5 months ago

Minecraft hangs very early in startup on OpenJDK 21 (running under r151050). I had a look at the process with jstack and this stood out as being unusual:

"ServerMain" #19 [20] prio=5 os_prio=64 cpu=2202.43ms elapsed=522.39s tid=0x0000000000e5a4a0 nid=20 waiting on condition  [0xfffffc7fe3bfe000]
   java.lang.Thread.State: WAITING (parking)
        at jdk.internal.misc.Unsafe.park(java.base@21.0.2/Native Method)
        - parking to wait for  <0x0000000765cebe78> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(java.base@21.0.2/LockSupport.java:371)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionNode.block(java.base@21.0.2/AbstractQueuedSynchronizer.java:519)
        at java.util.concurrent.ForkJoinPool.unmanagedBlock(java.base@21.0.2/ForkJoinPool.java:3780)
        at java.util.concurrent.ForkJoinPool.managedBlock(java.base@21.0.2/ForkJoinPool.java:3725)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@21.0.2/AbstractQueuedSynchronizer.java:1707)
        at java.lang.ProcessImpl.waitFor(java.base@21.0.2/ProcessImpl.java:465)
        at oshi.util.ExecutingCommand.getProcessOutput(ExecutingCommand.java:126)
        at oshi.util.ExecutingCommand.runNative(ExecutingCommand.java:87)
        at oshi.util.ExecutingCommand.runNative(ExecutingCommand.java:68)
        at oshi.util.ExecutingCommand.runNative(ExecutingCommand.java:55)
        at oshi.hardware.platform.unix.solaris.SolarisCentralProcessor.mapNumaNodes(SolarisCentralProcessor.java:159)
        at oshi.hardware.platform.unix.solaris.SolarisCentralProcessor.initProcessorCounts(SolarisCentralProcessor.java:95)
        at oshi.hardware.common.AbstractCentralProcessor.<init>(AbstractCentralProcessor.java:67)
        at oshi.hardware.platform.unix.solaris.SolarisCentralProcessor.<init>(SolarisCentralProcessor.java:33)
        at oshi.hardware.platform.unix.solaris.SolarisHardwareAbstractionLayer.createProcessor(SolarisHardwareAbstractionLayer.java:42)
        at oshi.hardware.common.AbstractHardwareAbstractionLayer$$Lambda/0xfffffc7f41204610.get(Unknown Source)
        at oshi.util.Memoizer$1.get(Memoizer.java:61)
        - locked <0x0000000765c70850> (a oshi.util.Memoizer$1)
        at oshi.hardware.common.AbstractHardwareAbstractionLayer.getProcessor(AbstractHardwareAbstractionLayer.java:48)
        at ab.c(SourceFile:75)
        at ab$$Lambda/0xfffffc7f41204c70.run(Unknown Source)
        at ab.a(SourceFile:82)
        at ab.a(SourceFile:75)
        at ab.c(SourceFile:52)
        at ab$$Lambda/0xfffffc7f411f7938.run(Unknown Source)
        at ab.a(SourceFile:82)
        at ab.<init>(SourceFile:52)
        at o.<init>(SourceFile:36)
        at o.h(SourceFile:277)
        at net.minecraft.server.Main.main(SourceFile:101)
        at java.lang.invoke.LambdaForm$DMH/0xfffffc7f41159c00.invokeStatic(java.base@21.0.2/LambdaForm$DMH)
        at java.lang.invoke.LambdaForm$MH/0xfffffc7f41184400.invoke(java.base@21.0.2/LambdaForm$MH)
        at java.lang.invoke.LambdaForm$MH/0xfffffc7f41184000.invoke_MT(java.base@21.0.2/LambdaForm$MH)
        at net.minecraft.bundler.Main.lambda$run$0(Main.java:54)
        at net.minecraft.bundler.Main$$Lambda/0xfffffc7f41003000.run(Unknown Source)
        at java.lang.Thread.runWith(java.base@21.0.2/Thread.java:1596)
        at java.lang.Thread.run(java.base@21.0.2/Thread.java:1583)

The code is stuck here, ostensibly, blocked forever waiting for Process.waitFor() to return.

There are a lot of moving parts in Minecraft so I sought to write a simpler reproduction of the issue:

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.nio.charset.Charset;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.List;

class ExecIt {
    public static void main(String[] args) {
        int count = 50;
        for (int i = 1; i <= count; i++) {
            System.out.println("running command " + i + "/" +
                count);
            ExecIt.runCommand();
            System.out.println();
        }
    }

    private static String[] DEFAULT_ENV = new String[] { "LC_ALL=C" };

    private static void runCommand() {
        String[] cmd = new String[] {
            "/usr/bin/id",
            "-a",
        };

        Process p = null;
        List<String> out = null;
        try {
            p = Runtime.getRuntime().exec(cmd, DEFAULT_ENV);
            out = getProcessOutput(p);
            System.out.println("process output: " + out);
        } catch (SecurityException | IOException e) {
            System.out.println("ERROR: " + e.getMessage());
            System.exit(1);
        } finally {
            if (p != null) {
                /*
                 * Allegedly we don't close "descriptors" on
                 * destroy, so handle that destruction here?
                 */
                try {
                    p.getOutputStream().close();
                } catch (IOException e) {
                    System.out.println(
                        "WARN: child stdin close: " +
                        e.getMessage());
                }
                try {
                    p.getInputStream().close();
                } catch (IOException e) {
                    System.out.println(
                        "WARN: child stdout close: " +
                        e.getMessage());
                }
                try {
                    p.getErrorStream().close();
                } catch (IOException e) {
                    System.out.println(
                        "WARN: child stderr close: " +
                        e.getMessage());
                }
            }
            p.destroy();
        }
    }

    private static List<String> getProcessOutput(Process p) {
        ArrayList<String> sa = new ArrayList<>();

        try (BufferedReader reader = new BufferedReader(
            new InputStreamReader(p.getInputStream(),
            Charset.defaultCharset())))
        {
            String line;

            while ((line = reader.readLine()) != null) {
                System.out.println("reading line: \"" +
                    line + "\"");
                sa.add(line);
            }

            System.out.println("waiting for process to end...");
            p.waitFor();
            System.out.println("process ended!");
        } catch (IOException e) {
            System.out.println("ERROR: input read: " +
                e.getMessage());
        } catch (InterruptedException ie) {
            System.out.println("ERROR: interrupted read: " +
                ie.getMessage());
            Thread.currentThread().interrupt();
        }

        return sa;
    }
}

This program attempts to run id -a 50 times in a row. Usually the first try will succeed, and then one of the second or third tries will hang forever:

 $ javac ExecIt.java  && java -cp . ExecIt
running command 1/50
reading line: "uid=1000(jclulow) gid=15000(nfsusers) groups=15000(nfsusers)"
waiting for process to end...
process ended!
process output: [uid=1000(jclulow) gid=15000(nfsusers) groups=15000(nfsusers)]

running command 2/50
reading line: "uid=1000(jclulow) gid=15000(nfsusers) groups=15000(nfsusers)"
waiting for process to end...

The full jstack output in this case:

 $ jstack 1517
2024-05-18 21:54:29
Full thread dump OpenJDK 64-Bit Server VM (21.0.2+13-omnios-151050 mixed mode, sharing):

Threads class SMR info:
_java_thread_list=0x0000000000d6abc0, length=11, elements={
0x0000000000a9a530, 0x0000000000c81b60, 0x0000000000c82d30, 0x0000000000c841a0,
0x0000000000c85610, 0x0000000000c86940, 0x0000000000c89380, 0x0000000000c8a670,
0x0000000000d086f0, 0x0000000000d0d0b0, 0x0000000000d71780
}

"main" #1 [2] prio=5 os_prio=64 cpu=137.37ms elapsed=234.30s tid=0x0000000000a9a530 nid=2 waiting on condition  [0xfffffc7fef06d000]
   java.lang.Thread.State: WAITING (parking)
        at jdk.internal.misc.Unsafe.park(java.base@21.0.2/Native Method)
        - parking to wait for  <0x00000000c3ed1290> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(java.base@21.0.2/LockSupport.java:371)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionNode.block(java.base@21.0.2/AbstractQueuedSynchronizer.java:519)
        at java.util.concurrent.ForkJoinPool.unmanagedBlock(java.base@21.0.2/ForkJoinPool.java:3780)
        at java.util.concurrent.ForkJoinPool.managedBlock(java.base@21.0.2/ForkJoinPool.java:3725)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@21.0.2/AbstractQueuedSynchronizer.java:1707)
        at java.lang.ProcessImpl.waitFor(java.base@21.0.2/ProcessImpl.java:465)
        at ExecIt.getProcessOutput(ExecIt.java:86)
        at ExecIt.runCommand(ExecIt.java:33)
        at ExecIt.main(ExecIt.java:16)

"Reference Handler" #9 [10] daemon prio=10 os_prio=64 cpu=0.56ms elapsed=234.25s tid=0x0000000000c81b60 nid=10 waiting on condition  [0xfffffc7fe9bff000]
   java.lang.Thread.State: RUNNABLE
        at java.lang.ref.Reference.waitForReferencePendingList(java.base@21.0.2/Native Method)
        at java.lang.ref.Reference.processPendingReferences(java.base@21.0.2/Reference.java:246)
        at java.lang.ref.Reference$ReferenceHandler.run(java.base@21.0.2/Reference.java:208)

"Finalizer" #10 [11] daemon prio=8 os_prio=64 cpu=0.96ms elapsed=234.25s tid=0x0000000000c82d30 nid=11 in Object.wait()  [0xfffffc7fe9fff000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait0(java.base@21.0.2/Native Method)
        - waiting on <0x00000000c3f01670> (a java.lang.ref.NativeReferenceQueue$Lock)
        at java.lang.Object.wait(java.base@21.0.2/Object.java:366)
        at java.lang.Object.wait(java.base@21.0.2/Object.java:339)
        at java.lang.ref.NativeReferenceQueue.await(java.base@21.0.2/NativeReferenceQueue.java:48)
        at java.lang.ref.ReferenceQueue.remove0(java.base@21.0.2/ReferenceQueue.java:158)
        at java.lang.ref.NativeReferenceQueue.remove(java.base@21.0.2/NativeReferenceQueue.java:89)
        - locked <0x00000000c3f01670> (a java.lang.ref.NativeReferenceQueue$Lock)
        at java.lang.ref.Finalizer$FinalizerThread.run(java.base@21.0.2/Finalizer.java:173)

"Signal Dispatcher" #11 [12] daemon prio=9 os_prio=64 cpu=0.91ms elapsed=234.25s tid=0x0000000000c841a0 nid=12 waiting on condition  [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Service Thread" #12 [13] daemon prio=9 os_prio=64 cpu=0.16ms elapsed=234.25s tid=0x0000000000c85610 nid=13 runnable  [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Monitor Deflation Thread" #13 [14] daemon prio=9 os_prio=64 cpu=56.48ms elapsed=234.25s tid=0x0000000000c86940 nid=14 runnable  [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread0" #14 [15] daemon prio=9 os_prio=64 cpu=24.39ms elapsed=234.25s tid=0x0000000000c89380 nid=15 waiting on condition  [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE
   No compile task

"C1 CompilerThread0" #16 [16] daemon prio=9 os_prio=64 cpu=72.73ms elapsed=234.25s tid=0x0000000000c8a670 nid=16 waiting on condition  [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE
   No compile task

"Notification Thread" #17 [17] daemon prio=9 os_prio=64 cpu=0.16ms elapsed=234.24s tid=0x0000000000d086f0 nid=17 runnable  [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Common-Cleaner" #18 [18] daemon prio=8 os_prio=64 cpu=1.32ms elapsed=234.23s tid=0x0000000000d0d0b0 nid=18 waiting on condition  [0xfffffc7fe49ff000]
   java.lang.Thread.State: TIMED_WAITING (parking)
        at jdk.internal.misc.Unsafe.park(java.base@21.0.2/Native Method)
        - parking to wait for  <0x00000000c3f10158> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.parkNanos(java.base@21.0.2/LockSupport.java:269)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@21.0.2/AbstractQueuedSynchronizer.java:1847)
        at java.lang.ref.ReferenceQueue.await(java.base@21.0.2/ReferenceQueue.java:71)
        at java.lang.ref.ReferenceQueue.remove0(java.base@21.0.2/ReferenceQueue.java:143)
        at java.lang.ref.ReferenceQueue.remove(java.base@21.0.2/ReferenceQueue.java:218)
        at jdk.internal.ref.CleanerImpl.run(java.base@21.0.2/CleanerImpl.java:140)
        at java.lang.Thread.runWith(java.base@21.0.2/Thread.java:1596)
        at java.lang.Thread.run(java.base@21.0.2/Thread.java:1583)
        at jdk.internal.misc.InnocuousThread.run(java.base@21.0.2/InnocuousThread.java:186)

"Attach Listener" #20 [20] daemon prio=9 os_prio=64 cpu=0.33ms elapsed=0.10s tid=0x0000000000d71780 nid=20 waiting on condition  [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"VM Thread" os_prio=64 cpu=18.64ms elapsed=234.26s tid=0x0000000000c74d70 nid=9 runnable

"GC Thread#0" os_prio=64 cpu=0.18ms elapsed=234.29s tid=0x0000000000ad5c60 nid=3 runnable

"G1 Main Marker" os_prio=64 cpu=0.28ms elapsed=234.29s tid=0x0000000000aded60 nid=4 runnable

"G1 Conc#0" os_prio=64 cpu=0.09ms elapsed=234.29s tid=0x0000000000adf960 nid=5 runnable

"G1 Refine#0" os_prio=64 cpu=0.39ms elapsed=234.29s tid=0x0000000000b25c10 nid=6 runnable

"G1 Service" os_prio=64 cpu=17.54ms elapsed=234.29s tid=0x0000000000b267d0 nid=7 runnable

"VM Periodic Task Thread" os_prio=64 cpu=327.90ms elapsed=234.26s tid=0x0000000000c5b090 nid=8 waiting on condition

JNI global refs: 8, weak refs: 0

Note that in this program the main thread is ostensibly blocked at the same place, waiting for some condition that is never triggered.

I have seen as many as three successful executions before the thing gets corked, but it appears to be basically invitable.

jclulow commented 5 months ago

I have built jdk18u-jdk-18.0.2.1-ga using Peter's jdk-sunos-builder and jdk-sunos-patches and confirmed that it does not appear to exhibit this problem.

jclulow commented 5 months ago

I have been able to build jdk19u-jdk-19.0.2-ga and it does exhibit the bug, so it must have been introduced before that final version of 19...

jclulow commented 5 months ago

Ok, I have been able to narrow down that the problem was introduced between jdk-jdk-19-21, which appears to work, and jdk-jdk-19-22, which does not.

geopoliticus commented 5 months ago

Not sure if this is related to what I am experiencing, but I tried to move my kids Minecraft world from Debian to SmartOS and have ostensibly the same issue. When I run the server.jar it just hangs. When I run jstack on the Minecraft pid I see:

"ServerMain" #28 [23] prio=5 os_prio=64 cpu=1378.85ms elapsed=12.87s tid=0x0000000000dd4810 nid=23 waiting on condition  [0xfffffaffc6dfe000]
   java.lang.Thread.State: WAITING (parking)
        at jdk.internal.misc.Unsafe.park(java.base@21.0.1-internal/Native Method)
        - parking to wait for  <0x00000000869691b8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(java.base@21.0.1-internal/LockSupport.java:371)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionNode.block(java.base@21.0.1-internal/AbstractQueuedSynchronizer.java:519)
        at java.util.concurrent.ForkJoinPool.unmanagedBlock(java.base@21.0.1-internal/ForkJoinPool.java:3780)
        at java.util.concurrent.ForkJoinPool.managedBlock(java.base@21.0.1-internal/ForkJoinPool.java:3725)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@21.0.1-internal/AbstractQueuedSynchronizer.java:1707)
        at java.lang.ProcessImpl.waitFor(java.base@21.0.1-internal/ProcessImpl.java:465)
        at oshi.util.ExecutingCommand.getProcessOutput(ExecutingCommand.java:126)
        at oshi.util.ExecutingCommand.runNative(ExecutingCommand.java:87)
        at oshi.util.ExecutingCommand.runNative(ExecutingCommand.java:68)
        at oshi.util.ExecutingCommand.runNative(ExecutingCommand.java:55)
        at oshi.hardware.platform.unix.solaris.SolarisCentralProcessor.mapNumaNodes(SolarisCentralProcessor.java:159)
        at oshi.hardware.platform.unix.solaris.SolarisCentralProcessor.initProcessorCounts(SolarisCentralProcessor.java:95)
        at oshi.hardware.common.AbstractCentralProcessor.<init>(AbstractCentralProcessor.java:67)
        at oshi.hardware.platform.unix.solaris.SolarisCentralProcessor.<init>(SolarisCentralProcessor.java:33)
        at oshi.hardware.platform.unix.solaris.SolarisHardwareAbstractionLayer.createProcessor(SolarisHardwareAbstractionLayer.java:42)
        at oshi.hardware.common.AbstractHardwareAbstractionLayer$$Lambda/0xfffffaff811ef810.get(Unknown Source)
        at oshi.util.Memoizer$1.get(Memoizer.java:61)
        - locked <0x0000000086aeb810> (a oshi.util.Memoizer$1)
        at oshi.hardware.common.AbstractHardwareAbstractionLayer.getProcessor(AbstractHardwareAbstractionLayer.java:48)
        at ab.c(SourceFile:75)
        at ab$$Lambda/0xfffffaff811f7000.run(Unknown Source)
        at ab.a(SourceFile:82)
        at ab.a(SourceFile:75)
        at ab.c(SourceFile:52)
        at ab$$Lambda/0xfffffaff811ecd30.run(Unknown Source)
        at ab.a(SourceFile:82)
        at ab.<init>(SourceFile:52)
        at o.<init>(SourceFile:36)
        at o.h(SourceFile:277)
        at net.minecraft.server.Main.main(SourceFile:101)
        at java.lang.invoke.LambdaForm$DMH/0xfffffaff81121c00.invokeStatic(java.base@21.0.1-internal/LambdaForm$DMH)
        at java.lang.invoke.LambdaForm$MH/0xfffffaff8112c400.invoke(java.base@21.0.1-internal/LambdaForm$MH)
        at java.lang.invoke.LambdaForm$MH/0xfffffaff8112c000.invoke_MT(java.base@21.0.1-internal/LambdaForm$MH)
        at net.minecraft.bundler.Main.lambda$run$0(Main.java:54)
        at net.minecraft.bundler.Main$$Lambda/0xfffffaff81009800.run(Unknown Source)
        at java.lang.Thread.runWith(java.base@21.0.1-internal/Thread.java:1596)
        at java.lang.Thread.run(java.base@21.0.1-internal/Thread.java:1583)

I did compile and run your program, but it succeeded all 50 times in less than a ms. So perhaps this isn't the same issue. I just thought I'd provide my experience in case this is any help to determining the issue.

I'm using SmartOS image base-64-lts 23.4.0 and OpenJDK Runtime Environment (build 21.0.1-internal-adhoc.pkgsrc.jdk-il-jdk-21.0.1-ga)

jclulow commented 5 months ago

@geopoliticus FWIW, I believe it's some kind of race condition. It very likely depends on how many CPUs you have, whether there is any CPU capping on that zone, how fast the CPUs are, how heavily loaded the system is, etc.

If you compile the sample program once and then run it in a loop; e.g.,

javac ExecIt.java
while :; do
    java -cp . ExecIt
done

Hopefully it hangs eventually? You could also try changing the number of times it starts processes before exiting to a much larger number than 50.

I have been reproducing the issue on a (virtual) OmniOS system with only 4 VCPUs, on a CPU that is not that quick by 2024 standards: an Intel(r) Xeon(r) CPU D-1541 @ 2.10GHz. What SmartOS platform version/datestamp are you running?

At any rate, the formatting on the stack trace you pasted is a bit odd, but in the middle I can see it is blocked in java.lang.ProcessImpl.waitFor(java.base@21.0.1-internal/ProcessImpl.java:465) which appears to be the centre of the hang for the other cases as well. I expect it's the same issue.

geopoliticus commented 5 months ago

I fixed the formatting on the previous message for better readability. I ran ExecIt but changed

int count = 100

This time I had the same issue you are reporting. Then I changed count back to 50 - recompiled - ran it again. Again I saw the problem you were describing above. I thought it strange because it failed before, so I rebooted the vm and burned sage. Tried it again and it did the same thing you are describing above. I must have done SOMETHING different, but I am at a loss as to what.

Here is my output:

[minecraft@minecraft ~/test_world]$ java ExecIt
running command 1/50
reading line: "uid=900(minecraft) gid=900(minecraft) groups=900(minecraft)"
waiting for process to end...
process ended!
process output: [uid=900(minecraft) gid=900(minecraft) groups=900(minecraft)]

running command 2/50
reading line: "uid=900(minecraft) gid=900(minecraft) groups=900(minecraft)"
waiting for process to end...
process ended!
process output: [uid=900(minecraft) gid=900(minecraft) groups=900(minecraft)]

running command 3/50
reading line: "uid=900(minecraft) gid=900(minecraft) groups=900(minecraft)"
waiting for process to end...
process ended!
process output: [uid=900(minecraft) gid=900(minecraft) groups=900(minecraft)]

running command 4/50
reading line: "uid=900(minecraft) gid=900(minecraft) groups=900(minecraft)"
waiting for process to end...
^Z
[1]+  Stopped                 java ExecIt
[minecraft@minecraft ~/test_world]$ bg
[1]+ java ExecIt &
[minecraft@minecraft ~/test_world]$ ls -al
total 50303
drwxr-xr-x 2 minecraft minecraft        6 May 23 23:55 .
drwxr-xr-x 5 minecraft minecraft       11 May 22 06:30 ..
-rw-r--r-- 1 minecraft minecraft        4 May 23 21:06 eula.txt
-rw-r--r-- 1 minecraft minecraft     4187 May 23 23:59 ExecIt.class
-rw-r--r-- 1 minecraft minecraft     2414 May 23 23:59 ExecIt.java
-rwxr-xr-x 1 minecraft minecraft 51424012 May 21 07:22 minecraft_server.1.20.5.jar
[minecraft@minecraft ~/test_world]$ ps
   PID TTY         TIME CMD
 25563 pts/2       0:00 bash
 25574 pts/2       0:00 openjdk2
 25583 pts/2       0:00 java
 25592 pts/2       0:00 ps
[minecraft@minecraft ~/test_world]$ jstack 25583
2024-05-24 00:02:17
Full thread dump OpenJDK 64-Bit Server VM (21.0.1-internal-adhoc.pkgsrc.jdk-il-jdk-21.0.1-ga mixed mode, sharing):

Threads class SMR info:
_java_thread_list=0x0000000000a835b0, length=11, elements={
0x00000000004f6810, 0x00000000007ae010, 0x00000000007cb810, 0x00000000007cb010,
0x00000000007d2810, 0x00000000007d2010, 0x00000000007d3810, 0x00000000007d3010,
0x000000000085f810, 0x000000000085f010, 0x0000000000b36810
}

"main" #1 [2] prio=5 os_prio=64 cpu=109.94ms elapsed=83.91s tid=0x00000000004f6810 nid=2 waiting on condition  [0xfffffaffeed3e000]
   java.lang.Thread.State: WAITING (parking)
        at jdk.internal.misc.Unsafe.park(java.base@21.0.1-internal/Native Method)
        - parking to wait for  <0x0000000087e66b38> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(java.base@21.0.1-internal/LockSupport.java:371)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionNode.block(java.base@21.0.1-internal/AbstractQueuedSynchronizer.java:519)
        at java.util.concurrent.ForkJoinPool.unmanagedBlock(java.base@21.0.1-internal/ForkJoinPool.java:3780)
        at java.util.concurrent.ForkJoinPool.managedBlock(java.base@21.0.1-internal/ForkJoinPool.java:3725)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@21.0.1-internal/AbstractQueuedSynchronizer.java:1707)
        at java.lang.ProcessImpl.waitFor(java.base@21.0.1-internal/ProcessImpl.java:465)
        at ExecIt.getProcessOutput(ExecIt.java:86)
        at ExecIt.runCommand(ExecIt.java:33)
        at ExecIt.main(ExecIt.java:16)

"Reference Handler" #9 [10] daemon prio=10 os_prio=64 cpu=1.26ms elapsed=83.87s tid=0x00000000007ae010 nid=10 waiting on condition  [0xfffffaffea7ff000]
   java.lang.Thread.State: RUNNABLE
        at java.lang.ref.Reference.waitForReferencePendingList(java.base@21.0.1-internal/Native Method)
        at java.lang.ref.Reference.processPendingReferences(java.base@21.0.1-internal/Reference.java:246)
        at java.lang.ref.Reference$ReferenceHandler.run(java.base@21.0.1-internal/Reference.java:208)

"Finalizer" #10 [11] daemon prio=8 os_prio=64 cpu=0.84ms elapsed=83.87s tid=0x00000000007cb810 nid=11 in Object.wait()  [0xfffffaffcc7ff000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait0(java.base@21.0.1-internal/Native Method)
        - waiting on <0x0000000087f01670> (a java.lang.ref.NativeReferenceQueue$Lock)
        at java.lang.Object.wait(java.base@21.0.1-internal/Object.java:366)
        at java.lang.Object.wait(java.base@21.0.1-internal/Object.java:339)
        at java.lang.ref.NativeReferenceQueue.await(java.base@21.0.1-internal/NativeReferenceQueue.java:48)
        at java.lang.ref.ReferenceQueue.remove0(java.base@21.0.1-internal/ReferenceQueue.java:158)
        at java.lang.ref.NativeReferenceQueue.remove(java.base@21.0.1-internal/NativeReferenceQueue.java:89)
        - locked <0x0000000087f01670> (a java.lang.ref.NativeReferenceQueue$Lock)
        at java.lang.ref.Finalizer$FinalizerThread.run(java.base@21.0.1-internal/Finalizer.java:173)

"Signal Dispatcher" #11 [12] daemon prio=9 os_prio=64 cpu=0.86ms elapsed=83.87s tid=0x00000000007cb010 nid=12 waiting on condition  [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Service Thread" #12 [13] daemon prio=9 os_prio=64 cpu=0.67ms elapsed=83.87s tid=0x00000000007d2810 nid=13 runnable  [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Monitor Deflation Thread" #13 [14] daemon prio=9 os_prio=64 cpu=5.78ms elapsed=83.87s tid=0x00000000007d2010 nid=14 runnable  [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread0" #14 [15] daemon prio=9 os_prio=64 cpu=25.35ms elapsed=83.87s tid=0x00000000007d3810 nid=15 waiting on condition  [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE
   No compile task

"C1 CompilerThread0" #22 [16] daemon prio=9 os_prio=64 cpu=33.44ms elapsed=83.87s tid=0x00000000007d3010 nid=16 waiting on condition  [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE
   No compile task

"Notification Thread" #26 [17] daemon prio=9 os_prio=64 cpu=0.40ms elapsed=83.87s tid=0x000000000085f810 nid=17 runnable  [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Common-Cleaner" #27 [18] daemon prio=8 os_prio=64 cpu=1.01ms elapsed=83.87s tid=0x000000000085f010 nid=18 waiting on condition  [0xfffffaffc71ff000]
   java.lang.Thread.State: TIMED_WAITING (parking)
        at jdk.internal.misc.Unsafe.park(java.base@21.0.1-internal/Native Method)
        - parking to wait for  <0x0000000087f10d38> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.parkNanos(java.base@21.0.1-internal/LockSupport.java:269)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@21.0.1-internal/AbstractQueuedSynchronizer.java:1847)
        at java.lang.ref.ReferenceQueue.await(java.base@21.0.1-internal/ReferenceQueue.java:71)
        at java.lang.ref.ReferenceQueue.remove0(java.base@21.0.1-internal/ReferenceQueue.java:143)
        at java.lang.ref.ReferenceQueue.remove(java.base@21.0.1-internal/ReferenceQueue.java:218)
        at jdk.internal.ref.CleanerImpl.run(java.base@21.0.1-internal/CleanerImpl.java:140)
        at java.lang.Thread.runWith(java.base@21.0.1-internal/Thread.java:1596)
        at java.lang.Thread.run(java.base@21.0.1-internal/Thread.java:1583)
        at jdk.internal.misc.InnocuousThread.run(java.base@21.0.1-internal/InnocuousThread.java:186)

"Attach Listener" #29 [22] daemon prio=9 os_prio=64 cpu=0.24ms elapsed=0.10s tid=0x0000000000b36810 nid=22 waiting on condition  [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"VM Thread" os_prio=64 cpu=2.87ms elapsed=83.88s tid=0x0000000000666010 nid=9 runnable  

"GC Thread#0" os_prio=64 cpu=0.74ms elapsed=83.91s tid=0x00000000004a1950 nid=3 runnable  

"G1 Main Marker" os_prio=64 cpu=0.79ms elapsed=83.91s tid=0x00000000004a14d0 nid=4 runnable  

"G1 Conc#0" os_prio=64 cpu=0.63ms elapsed=83.91s tid=0x0000000000575b90 nid=5 runnable  

"G1 Refine#0" os_prio=64 cpu=0.61ms elapsed=83.91s tid=0x0000000000575290 nid=6 runnable  

"G1 Service" os_prio=64 cpu=2.16ms elapsed=83.91s tid=0x0000000000574990 nid=7 runnable  

"VM Periodic Task Thread" os_prio=64 cpu=26.82ms elapsed=83.89s tid=0x0000000000667690 nid=8 waiting on condition  

JNI global refs: 17, weak refs: 0

This is running inside of a SmartOS non-global zone - image - base-64-lts 23.4.0 On the metal (SmartOS global zone) I'm running SmartOS (build: 20240516T000925Z) A pair of Intel(R) Xeon(R) CPU X5675 @ 3.07GHz are inside of an old Cisco C250 M2 server.

I think I am going to try compiling the latest OpenJDK21 instead of using the pkgsrc version and see if this might fix the issue.

Must not be that many folks running Minecraft on Illumos OS's? Yours is the only post I've run across with the issue.

BTW - The reason I'm running the minecraft_server.1.20.5.jar version is because that is what is currently working on my Debian VM running on ESXi. I tried it with 1.20.6, but had the same issue and didn't want to change that variable, so I'm troubleshooting with 1.20.5.

jclulow commented 5 months ago

I think at least a handful of people were running the server, but sadly Microsoft recently decided to start building the server JAR with the latest LTS release, rather than the earliest supported LTS release, which is why I am here!

siepkes commented 5 months ago

@jclulow I may be wrong but I think there might be an issue with your reproducer.

I think you must also read getErrorStream() either from a separate thread or you can also redirect the error stream into the standard stream with something like: Process p = new ProcessBuilder(cmd).redirectErrorStream(true).start();. Otherwise the thing can deadlock (which might be what you are seeing).

The semantics of Process aren't really intuitive.

jclulow commented 5 months ago

Yeah, obviously I'm not deeply versed in this area! I made the reproducer by looking at the code that was getting stuck, which I think was here: https://github.com/oshi/oshi/blob/40be04935ad7ce83250cabcad7aafb2f02973a24/oshi-core/src/main/java/oshi/util/ExecutingCommand.java#L71-L116

It gets called here: https://github.com/oshi/oshi/blob/40be04935ad7ce83250cabcad7aafb2f02973a24/oshi-core/src/main/java/oshi/hardware/platform/unix/solaris/SolarisCentralProcessor.java#L163

I don't really expect that lgrpinfo -c leaves would produce stderr output, generally. At least it does not on the system where I run the Minecraft stuff. I think it's even less likely that id -a, in the reproducer, would produce any stderr output. So, I agree it's possible for things like pipe deadlock to occur, it doesn't feel like that's what is happening here -- indeed, it did not used to occur in the older versions, and it does not occur every time now, which implies to me it's at least some kind of race in the implementation. Even if I'm holding the API somewhat wrong it should really either work, or not work, each time I call it with the same arguments.

ptribble commented 5 months ago

There were changes to src/java.base/unix/classes/java/lang/ProcessImpl.java as part of Loom that I missed. See https://github.com/ptribble/jdk-sunos-patches/commit/775d38e78114817db3407d447d7f084c0782c295, or just pull the latest java-solaris-sparc.patch for openjdk21 from tribblix/build, and this particular problem ought to be fixed.

Apologies for not catching this sooner.

jclulow commented 5 months ago

Thank you so much for looking! It's been a struggle to find time myself to drill down further.

I'll see how that goes for Minecraft soon and report back.

geopoliticus commented 4 months ago

Thanks so much @ptribble. I pulled your updates in a week or so ago on SmartOS and it's been working great. That is my kids and their friends have been using the latest Minecraft server release with no complaints. Really appreciate the work you've been doing with the Java builds that work for Illumos et al.