hexops / mach

zig game engine & graphics toolkit
https://machengine.org
Other
2.97k stars 143 forks source link

gpu: gpu-dawn crashes JVM on Windows 11 #1213

Closed SuperIceCN closed 11 hours ago

SuperIceCN commented 1 month ago

Considering the following scenario: If you wish to use webgpu via zig in a Java program, it would be natural to use JNI to call the zig function with callconv(.C). However, I've tested mach-gpu on my two PC and the following code makes JVM crashed. (Btw, the dawn binding by zig-gamedev works well).

Native.java:

public final class Native {
    public static void main(String[] args) {
        load(); // This method calls the System#load method to load the shared library.
        init();
    }

    private static native boolean init();
}

Native.zig:

const jni = @import("jni");
const webgpu = @import("webgpu");
const webgpu_util = @import("../util/webgpu_util.zig");
const std = @import("std");

pub var instance: ?*webgpu.Instance = null;

pub fn init(cEnv: *jni.cEnv, _: jni.jclass) callconv(.C) jni.jboolean {
    std.debug.print("Initializing Dawn...\n", .{});
    if (instance == null) {
        webgpu.Impl.init(std.heap.c_allocator, .{}) catch return jni.bool2jboolean(false);
        instance = webgpu.createInstance(null); 
        std.debug.print("WGPU instance {?p}\n", .{instance});
        instance.?.reference();
        std.debug.print("WGPU instance referenced\n", .{});
        if (instance == null) {
            return jni.bool2jboolean(false);
        }
        const adapter_options = webgpu.RequestAdapterOptions{ .power_preference = .high_performance };
        var resp: webgpu_util.RequestAdapterResponse = undefined;
        instance.?.requestAdapter(&adapter_options, &resp, webgpu_util.requestAdapterCallback);
        if (resp.adapter == null) {
            std.debug.print("Failed to get adapter\n", .{});
        } else {
            std.debug.print("Adapter: {?p}\n", .{resp.adapter});
        }
    }
    return jni.bool2jboolean(true);
}

root.zig:

const jni = @import("jni");

comptime {
    jni.exportJNI("Native", @import("Native.zig"));
}

Here the jni library is SuperIceCN/zig-jni.

Build and run Native.java and then the JVM crashed.

Console log:

Initializing Dawn...
WGPU instance instance.Instance@254999f6400
WGPU instance referenced

JVM crash report:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x00007ffc24fea635, pid=89496, tid=86204
#
# JRE version: Java(TM) SE Runtime Environment Oracle GraalVM 17.0.7+8.1 (17.0.7+8) (build 17.0.7+8-LTS-jvmci-23.0-b12)
# Java VM: Java HotSpot(TM) 64-Bit Server VM Oracle GraalVM 17.0.7+8.1 (17.0.7+8-LTS-jvmci-23.0-b12, mixed mode, sharing, tiered, jvmci, jvmci compiler, compressed oops, compressed class ptrs, g1 gc, windows-amd64)
# Problematic frame:
# C  [ntdll.dll+0x3a635]
#
---------------  S U M M A R Y ------------

Command Line: -XX:ThreadPriorityPolicy=1 -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCIProduct -XX:-UnlockExperimentalVMOptions ......

Host: AMD Ryzen 7 5800H with Radeon Graphics         , 16 cores, 59G,  Windows 11 , 64 bit Build 22621 (10.0.22621.3527)
Time: Thu Jun  6 19:40:43 2024  Windows 11 , 64 bit Build 22621 (10.0.22621.3527) elapsed time: 0.830858 seconds (0d 0h 0m 0s)

---------------  T H R E A D  ---------------

Current thread (0x00000274bf0d21f0):  JavaThread "Test worker" [_thread_in_native, id=86204, stack(0x000000bdb6b00000,0x000000bdb6c00000)]

Stack: [0x000000bdb6b00000,0x000000bdb6c00000],  sp=0x000000bdb6bfa650,  free space=1001k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [ntdll.dll+0x3a635]
C  [ucrtbase.dll+0x10db1]
C  [ucrtbase.dll+0x538cf]
C  [ucrtbase.dll+0x244c6]
C  [ucrtbase.dll+0x24478]
C  [my_shared_library.dll+0x42cf0]

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j  Native.init()Z+0
...

I have been spending a long time on this issue, but I still cannot find a clue. I am grateful for any help.

slimsag commented 1 month ago

By default Dawn bounces all C API calls through a vtable which acts as a swappable interface implementation for e.g. testing.

To initialize the interface, there must be a call to dawnProcSetProcs setting it to the return value of dawn::native::GetProcs(). Otherwise, you can directly call the result of function pointers dawn::native::GetProcs() returns and skip the vtable entirely - which is what Mach does: https://sourcegraph.com/search?q=context:global+repo:%5Egithub%5C.com/hexops/mach%24+machDawnGetProcTable&patternType=keyword&sm=0

There is some more info on it here: https://machengine.org/pkg/mach-gpu-dawn/#important-building-webgpu-api-symbols

However, I must warn you that the Mach project is moving away from Dawn/WebGPU in favor of our own graphics abstraction sysgpu. As a result, the mach-gpu-dawn project is likely to be removed soon.

My advice would be one of two options:

  1. You could work with me on how to make e.g. Mach core more generally accessible to Java applications through JNI, this probably requires some planning and will take more effort, but would be a better solution long term depending on your use cases.
  2. You can find a different way to get webgpu through Java

Hope that helps!

SuperIceCN commented 1 month ago

First of all, thank you for your help!

I just tried calling directly and it didn't crashed the JVM though the status of the response is not success while in an executable file it works. Anyway, I am considering switching to using sysgpu, which I guess that it will be much more easier to debug. My only concern is if sysgpu can achieve a relatively stable state in the coming months, at least in the parts related to compute shaders? (According to the commit records, there has been no progress for four months.) If so, I'm willing to make at least sysgpu accessible to Java&Kotlin applications, since JVM and Android community is in short of a gpu interface that is modern and easy to deploy for a long time.

Thank you for taking the time out of your busy schedule to answer my question.

slimsag commented 4 weeks ago

sysgpu is definitely not stable, it is extremely experimental and likely to change a lot in the coming months. I expect its API and shading language to change dramatically.