Open jclulow opened 5 months ago
I have built jdk18u-jdk-18.0.2.1-ga
using Peter's jdk-sunos-builder and jdk-sunos-patches and confirmed that it does not appear to exhibit this problem.
I have been able to build jdk19u-jdk-19.0.2-ga
and it does exhibit the bug, so it must have been introduced before that final version of 19...
Ok, I have been able to narrow down that the problem was introduced between jdk-jdk-19-21
, which appears to work, and jdk-jdk-19-22
, which does not.
Not sure if this is related to what I am experiencing, but I tried to move my kids Minecraft world from Debian to SmartOS and have ostensibly the same issue. When I run the server.jar it just hangs. When I run jstack on the Minecraft pid I see:
"ServerMain" #28 [23] prio=5 os_prio=64 cpu=1378.85ms elapsed=12.87s tid=0x0000000000dd4810 nid=23 waiting on condition [0xfffffaffc6dfe000]
java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@21.0.1-internal/Native Method)
- parking to wait for <0x00000000869691b8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(java.base@21.0.1-internal/LockSupport.java:371)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionNode.block(java.base@21.0.1-internal/AbstractQueuedSynchronizer.java:519)
at java.util.concurrent.ForkJoinPool.unmanagedBlock(java.base@21.0.1-internal/ForkJoinPool.java:3780)
at java.util.concurrent.ForkJoinPool.managedBlock(java.base@21.0.1-internal/ForkJoinPool.java:3725)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@21.0.1-internal/AbstractQueuedSynchronizer.java:1707)
at java.lang.ProcessImpl.waitFor(java.base@21.0.1-internal/ProcessImpl.java:465)
at oshi.util.ExecutingCommand.getProcessOutput(ExecutingCommand.java:126)
at oshi.util.ExecutingCommand.runNative(ExecutingCommand.java:87)
at oshi.util.ExecutingCommand.runNative(ExecutingCommand.java:68)
at oshi.util.ExecutingCommand.runNative(ExecutingCommand.java:55)
at oshi.hardware.platform.unix.solaris.SolarisCentralProcessor.mapNumaNodes(SolarisCentralProcessor.java:159)
at oshi.hardware.platform.unix.solaris.SolarisCentralProcessor.initProcessorCounts(SolarisCentralProcessor.java:95)
at oshi.hardware.common.AbstractCentralProcessor.<init>(AbstractCentralProcessor.java:67)
at oshi.hardware.platform.unix.solaris.SolarisCentralProcessor.<init>(SolarisCentralProcessor.java:33)
at oshi.hardware.platform.unix.solaris.SolarisHardwareAbstractionLayer.createProcessor(SolarisHardwareAbstractionLayer.java:42)
at oshi.hardware.common.AbstractHardwareAbstractionLayer$$Lambda/0xfffffaff811ef810.get(Unknown Source)
at oshi.util.Memoizer$1.get(Memoizer.java:61)
- locked <0x0000000086aeb810> (a oshi.util.Memoizer$1)
at oshi.hardware.common.AbstractHardwareAbstractionLayer.getProcessor(AbstractHardwareAbstractionLayer.java:48)
at ab.c(SourceFile:75)
at ab$$Lambda/0xfffffaff811f7000.run(Unknown Source)
at ab.a(SourceFile:82)
at ab.a(SourceFile:75)
at ab.c(SourceFile:52)
at ab$$Lambda/0xfffffaff811ecd30.run(Unknown Source)
at ab.a(SourceFile:82)
at ab.<init>(SourceFile:52)
at o.<init>(SourceFile:36)
at o.h(SourceFile:277)
at net.minecraft.server.Main.main(SourceFile:101)
at java.lang.invoke.LambdaForm$DMH/0xfffffaff81121c00.invokeStatic(java.base@21.0.1-internal/LambdaForm$DMH)
at java.lang.invoke.LambdaForm$MH/0xfffffaff8112c400.invoke(java.base@21.0.1-internal/LambdaForm$MH)
at java.lang.invoke.LambdaForm$MH/0xfffffaff8112c000.invoke_MT(java.base@21.0.1-internal/LambdaForm$MH)
at net.minecraft.bundler.Main.lambda$run$0(Main.java:54)
at net.minecraft.bundler.Main$$Lambda/0xfffffaff81009800.run(Unknown Source)
at java.lang.Thread.runWith(java.base@21.0.1-internal/Thread.java:1596)
at java.lang.Thread.run(java.base@21.0.1-internal/Thread.java:1583)
I did compile and run your program, but it succeeded all 50 times in less than a ms. So perhaps this isn't the same issue. I just thought I'd provide my experience in case this is any help to determining the issue.
I'm using SmartOS image base-64-lts 23.4.0 and OpenJDK Runtime Environment (build 21.0.1-internal-adhoc.pkgsrc.jdk-il-jdk-21.0.1-ga)
@geopoliticus FWIW, I believe it's some kind of race condition. It very likely depends on how many CPUs you have, whether there is any CPU capping on that zone, how fast the CPUs are, how heavily loaded the system is, etc.
If you compile the sample program once and then run it in a loop; e.g.,
javac ExecIt.java
while :; do
java -cp . ExecIt
done
Hopefully it hangs eventually? You could also try changing the number of times it starts processes before exiting to a much larger number than 50.
I have been reproducing the issue on a (virtual) OmniOS system with only 4 VCPUs, on a CPU that is not that quick by 2024 standards: an Intel(r) Xeon(r) CPU D-1541 @ 2.10GHz. What SmartOS platform version/datestamp are you running?
At any rate, the formatting on the stack trace you pasted is a bit odd, but in the middle I can see it is blocked in java.lang.ProcessImpl.waitFor(java.base@21.0.1-internal/ProcessImpl.java:465)
which appears to be the centre of the hang for the other cases as well. I expect it's the same issue.
I fixed the formatting on the previous message for better readability. I ran ExecIt but changed
int count = 100
This time I had the same issue you are reporting. Then I changed count back to 50 - recompiled - ran it again. Again I saw the problem you were describing above. I thought it strange because it failed before, so I rebooted the vm and burned sage. Tried it again and it did the same thing you are describing above. I must have done SOMETHING different, but I am at a loss as to what.
Here is my output:
[minecraft@minecraft ~/test_world]$ java ExecIt
running command 1/50
reading line: "uid=900(minecraft) gid=900(minecraft) groups=900(minecraft)"
waiting for process to end...
process ended!
process output: [uid=900(minecraft) gid=900(minecraft) groups=900(minecraft)]
running command 2/50
reading line: "uid=900(minecraft) gid=900(minecraft) groups=900(minecraft)"
waiting for process to end...
process ended!
process output: [uid=900(minecraft) gid=900(minecraft) groups=900(minecraft)]
running command 3/50
reading line: "uid=900(minecraft) gid=900(minecraft) groups=900(minecraft)"
waiting for process to end...
process ended!
process output: [uid=900(minecraft) gid=900(minecraft) groups=900(minecraft)]
running command 4/50
reading line: "uid=900(minecraft) gid=900(minecraft) groups=900(minecraft)"
waiting for process to end...
^Z
[1]+ Stopped java ExecIt
[minecraft@minecraft ~/test_world]$ bg
[1]+ java ExecIt &
[minecraft@minecraft ~/test_world]$ ls -al
total 50303
drwxr-xr-x 2 minecraft minecraft 6 May 23 23:55 .
drwxr-xr-x 5 minecraft minecraft 11 May 22 06:30 ..
-rw-r--r-- 1 minecraft minecraft 4 May 23 21:06 eula.txt
-rw-r--r-- 1 minecraft minecraft 4187 May 23 23:59 ExecIt.class
-rw-r--r-- 1 minecraft minecraft 2414 May 23 23:59 ExecIt.java
-rwxr-xr-x 1 minecraft minecraft 51424012 May 21 07:22 minecraft_server.1.20.5.jar
[minecraft@minecraft ~/test_world]$ ps
PID TTY TIME CMD
25563 pts/2 0:00 bash
25574 pts/2 0:00 openjdk2
25583 pts/2 0:00 java
25592 pts/2 0:00 ps
[minecraft@minecraft ~/test_world]$ jstack 25583
2024-05-24 00:02:17
Full thread dump OpenJDK 64-Bit Server VM (21.0.1-internal-adhoc.pkgsrc.jdk-il-jdk-21.0.1-ga mixed mode, sharing):
Threads class SMR info:
_java_thread_list=0x0000000000a835b0, length=11, elements={
0x00000000004f6810, 0x00000000007ae010, 0x00000000007cb810, 0x00000000007cb010,
0x00000000007d2810, 0x00000000007d2010, 0x00000000007d3810, 0x00000000007d3010,
0x000000000085f810, 0x000000000085f010, 0x0000000000b36810
}
"main" #1 [2] prio=5 os_prio=64 cpu=109.94ms elapsed=83.91s tid=0x00000000004f6810 nid=2 waiting on condition [0xfffffaffeed3e000]
java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@21.0.1-internal/Native Method)
- parking to wait for <0x0000000087e66b38> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(java.base@21.0.1-internal/LockSupport.java:371)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionNode.block(java.base@21.0.1-internal/AbstractQueuedSynchronizer.java:519)
at java.util.concurrent.ForkJoinPool.unmanagedBlock(java.base@21.0.1-internal/ForkJoinPool.java:3780)
at java.util.concurrent.ForkJoinPool.managedBlock(java.base@21.0.1-internal/ForkJoinPool.java:3725)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@21.0.1-internal/AbstractQueuedSynchronizer.java:1707)
at java.lang.ProcessImpl.waitFor(java.base@21.0.1-internal/ProcessImpl.java:465)
at ExecIt.getProcessOutput(ExecIt.java:86)
at ExecIt.runCommand(ExecIt.java:33)
at ExecIt.main(ExecIt.java:16)
"Reference Handler" #9 [10] daemon prio=10 os_prio=64 cpu=1.26ms elapsed=83.87s tid=0x00000000007ae010 nid=10 waiting on condition [0xfffffaffea7ff000]
java.lang.Thread.State: RUNNABLE
at java.lang.ref.Reference.waitForReferencePendingList(java.base@21.0.1-internal/Native Method)
at java.lang.ref.Reference.processPendingReferences(java.base@21.0.1-internal/Reference.java:246)
at java.lang.ref.Reference$ReferenceHandler.run(java.base@21.0.1-internal/Reference.java:208)
"Finalizer" #10 [11] daemon prio=8 os_prio=64 cpu=0.84ms elapsed=83.87s tid=0x00000000007cb810 nid=11 in Object.wait() [0xfffffaffcc7ff000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait0(java.base@21.0.1-internal/Native Method)
- waiting on <0x0000000087f01670> (a java.lang.ref.NativeReferenceQueue$Lock)
at java.lang.Object.wait(java.base@21.0.1-internal/Object.java:366)
at java.lang.Object.wait(java.base@21.0.1-internal/Object.java:339)
at java.lang.ref.NativeReferenceQueue.await(java.base@21.0.1-internal/NativeReferenceQueue.java:48)
at java.lang.ref.ReferenceQueue.remove0(java.base@21.0.1-internal/ReferenceQueue.java:158)
at java.lang.ref.NativeReferenceQueue.remove(java.base@21.0.1-internal/NativeReferenceQueue.java:89)
- locked <0x0000000087f01670> (a java.lang.ref.NativeReferenceQueue$Lock)
at java.lang.ref.Finalizer$FinalizerThread.run(java.base@21.0.1-internal/Finalizer.java:173)
"Signal Dispatcher" #11 [12] daemon prio=9 os_prio=64 cpu=0.86ms elapsed=83.87s tid=0x00000000007cb010 nid=12 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Service Thread" #12 [13] daemon prio=9 os_prio=64 cpu=0.67ms elapsed=83.87s tid=0x00000000007d2810 nid=13 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Monitor Deflation Thread" #13 [14] daemon prio=9 os_prio=64 cpu=5.78ms elapsed=83.87s tid=0x00000000007d2010 nid=14 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread0" #14 [15] daemon prio=9 os_prio=64 cpu=25.35ms elapsed=83.87s tid=0x00000000007d3810 nid=15 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
No compile task
"C1 CompilerThread0" #22 [16] daemon prio=9 os_prio=64 cpu=33.44ms elapsed=83.87s tid=0x00000000007d3010 nid=16 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
No compile task
"Notification Thread" #26 [17] daemon prio=9 os_prio=64 cpu=0.40ms elapsed=83.87s tid=0x000000000085f810 nid=17 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Common-Cleaner" #27 [18] daemon prio=8 os_prio=64 cpu=1.01ms elapsed=83.87s tid=0x000000000085f010 nid=18 waiting on condition [0xfffffaffc71ff000]
java.lang.Thread.State: TIMED_WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@21.0.1-internal/Native Method)
- parking to wait for <0x0000000087f10d38> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(java.base@21.0.1-internal/LockSupport.java:269)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@21.0.1-internal/AbstractQueuedSynchronizer.java:1847)
at java.lang.ref.ReferenceQueue.await(java.base@21.0.1-internal/ReferenceQueue.java:71)
at java.lang.ref.ReferenceQueue.remove0(java.base@21.0.1-internal/ReferenceQueue.java:143)
at java.lang.ref.ReferenceQueue.remove(java.base@21.0.1-internal/ReferenceQueue.java:218)
at jdk.internal.ref.CleanerImpl.run(java.base@21.0.1-internal/CleanerImpl.java:140)
at java.lang.Thread.runWith(java.base@21.0.1-internal/Thread.java:1596)
at java.lang.Thread.run(java.base@21.0.1-internal/Thread.java:1583)
at jdk.internal.misc.InnocuousThread.run(java.base@21.0.1-internal/InnocuousThread.java:186)
"Attach Listener" #29 [22] daemon prio=9 os_prio=64 cpu=0.24ms elapsed=0.10s tid=0x0000000000b36810 nid=22 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"VM Thread" os_prio=64 cpu=2.87ms elapsed=83.88s tid=0x0000000000666010 nid=9 runnable
"GC Thread#0" os_prio=64 cpu=0.74ms elapsed=83.91s tid=0x00000000004a1950 nid=3 runnable
"G1 Main Marker" os_prio=64 cpu=0.79ms elapsed=83.91s tid=0x00000000004a14d0 nid=4 runnable
"G1 Conc#0" os_prio=64 cpu=0.63ms elapsed=83.91s tid=0x0000000000575b90 nid=5 runnable
"G1 Refine#0" os_prio=64 cpu=0.61ms elapsed=83.91s tid=0x0000000000575290 nid=6 runnable
"G1 Service" os_prio=64 cpu=2.16ms elapsed=83.91s tid=0x0000000000574990 nid=7 runnable
"VM Periodic Task Thread" os_prio=64 cpu=26.82ms elapsed=83.89s tid=0x0000000000667690 nid=8 waiting on condition
JNI global refs: 17, weak refs: 0
This is running inside of a SmartOS non-global zone - image - base-64-lts 23.4.0 On the metal (SmartOS global zone) I'm running SmartOS (build: 20240516T000925Z) A pair of Intel(R) Xeon(R) CPU X5675 @ 3.07GHz are inside of an old Cisco C250 M2 server.
I think I am going to try compiling the latest OpenJDK21 instead of using the pkgsrc version and see if this might fix the issue.
Must not be that many folks running Minecraft on Illumos OS's? Yours is the only post I've run across with the issue.
BTW - The reason I'm running the minecraft_server.1.20.5.jar version is because that is what is currently working on my Debian VM running on ESXi. I tried it with 1.20.6, but had the same issue and didn't want to change that variable, so I'm troubleshooting with 1.20.5.
I think at least a handful of people were running the server, but sadly Microsoft recently decided to start building the server JAR with the latest LTS release, rather than the earliest supported LTS release, which is why I am here!
@jclulow I may be wrong but I think there might be an issue with your reproducer.
I think you must also read getErrorStream()
either from a separate thread or you can also redirect the error stream into the standard stream with something like: Process p = new ProcessBuilder(cmd).redirectErrorStream(true).start();
. Otherwise the thing can deadlock (which might be what you are seeing).
The semantics of Process aren't really intuitive.
Yeah, obviously I'm not deeply versed in this area! I made the reproducer by looking at the code that was getting stuck, which I think was here: https://github.com/oshi/oshi/blob/40be04935ad7ce83250cabcad7aafb2f02973a24/oshi-core/src/main/java/oshi/util/ExecutingCommand.java#L71-L116
It gets called here: https://github.com/oshi/oshi/blob/40be04935ad7ce83250cabcad7aafb2f02973a24/oshi-core/src/main/java/oshi/hardware/platform/unix/solaris/SolarisCentralProcessor.java#L163
I don't really expect that lgrpinfo -c leaves
would produce stderr output, generally. At least it does not on the system where I run the Minecraft stuff. I think it's even less likely that id -a
, in the reproducer, would produce any stderr output. So, I agree it's possible for things like pipe deadlock to occur, it doesn't feel like that's what is happening here -- indeed, it did not used to occur in the older versions, and it does not occur every time now, which implies to me it's at least some kind of race in the implementation. Even if I'm holding the API somewhat wrong it should really either work, or not work, each time I call it with the same arguments.
There were changes to src/java.base/unix/classes/java/lang/ProcessImpl.java as part of Loom that I missed. See https://github.com/ptribble/jdk-sunos-patches/commit/775d38e78114817db3407d447d7f084c0782c295, or just pull the latest java-solaris-sparc.patch for openjdk21 from tribblix/build, and this particular problem ought to be fixed.
Apologies for not catching this sooner.
Thank you so much for looking! It's been a struggle to find time myself to drill down further.
I'll see how that goes for Minecraft soon and report back.
Thanks so much @ptribble. I pulled your updates in a week or so ago on SmartOS and it's been working great. That is my kids and their friends have been using the latest Minecraft server release with no complaints. Really appreciate the work you've been doing with the Java builds that work for Illumos et al.
Minecraft hangs very early in startup on OpenJDK 21 (running under r151050). I had a look at the process with jstack and this stood out as being unusual:
The code is stuck here, ostensibly, blocked forever waiting for Process.waitFor() to return.
There are a lot of moving parts in Minecraft so I sought to write a simpler reproduction of the issue:
This program attempts to run
id -a
50 times in a row. Usually the first try will succeed, and then one of the second or third tries will hang forever:The full jstack output in this case:
Note that in this program the main thread is ostensibly blocked at the same place, waiting for some condition that is never triggered.
I have seen as many as three successful executions before the thing gets corked, but it appears to be basically invitable.