jnr / jnr-ffi

Java Abstracted Foreign Function Layer
Other
1.26k stars 157 forks source link

NoClassDefFoundError when callback method invoked #274

Open charleskorn opened 3 years ago

charleskorn commented 3 years ago

I have defined a callback interface like this (all in Kotlin):

internal interface PullImageProgressCallback {
    @Delegate
    fun invoke(userData: Pointer?, progress: PullImageProgressUpdate)
}

internal class PullImageProgressUpdate(runtime: Runtime) : Struct(runtime) {
    constructor(pointer: jnr.ffi.Pointer) : this(pointer.runtime) {
        this.useMemory(pointer)
    }

    val message = UTF8StringRef()
    private val detailPointer = Pointer()
    val detail: PullImageProgressDetail? by lazy { if (detailPointer.intValue() == 0) null else PullImageProgressDetail(detailPointer.get()) }
    val id = UTF8StringRef()
}

internal class PullImageProgressDetail(runtime: Runtime) : Struct(runtime) {
    constructor(pointer: jnr.ffi.Pointer) : this(pointer.runtime) {
        this.useMemory(pointer)
    }

    val current = int64_t()
    val total = int64_t()
}

The native function is defined like this:

internal interface API {
    fun PullImage(@In clientHandle: Long, @In ref: String, @In onProgressUpdate: PullImageProgressCallback, @In callbackUserData: Pointer?): PullImageReturn?
}

However, when I run invoke PullImage and the native code invokes the callback function, I get exceptions like this:

    java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
        at jnr.ffi.provider.jffi.NativeClosureProxy$Factory.newClosureProxy(NativeClosureProxy.java:89)
        at jnr.ffi.provider.jffi.NativeClosureFactory.allocateClosurePointer(NativeClosureFactory.java:156)
        at jnr.ffi.provider.jffi.NativeClosureFactory.newClosureReference(NativeClosureFactory.java:177)
        at jnr.ffi.provider.jffi.NativeClosureFactory.getClosureReference(NativeClosureFactory.java:218)
        at jnr.ffi.provider.jffi.NativeClosureManager$ClosureSite.toNative(NativeClosureManager.java:117)
        at batect.dockerclient.native.API$jnr$ffi$0.PullImage(Unknown Source)
        at batect.dockerclient.DockerClient.pullImage(DockerClient.kt:148)
        at batect.dockerclient.DockerClient.pullImage$default(DockerClient.kt:135)
        at batect.dockerclient.DockerClientImagePullSpec$1$2$1.invokeSuspend(DockerClientImagePullSpec.kt:63)
        at batect.dockerclient.DockerClientImagePullSpec$1$2$1.invoke(DockerClientImagePullSpec.kt)
        at batect.dockerclient.DockerClientImagePullSpec$1$2$1.invoke(DockerClientImagePullSpec.kt)

        Caused by:
        java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
            at jnr.ffi.provider.jffi.NativeClosureProxy$Factory.newClosureProxy(NativeClosureProxy.java:89)
            ... 10 more

            Caused by:
            java.lang.reflect.InvocationTargetException
                at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
                at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
                at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
                at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
                at jnr.ffi.provider.jffi.NativeClosureProxy$Factory.newClosureProxy(NativeClosureProxy.java:87)
                ... 10 more

                Caused by:
                java.lang.NoClassDefFoundError: batect/dockerclient/native/PullImageProgressUpdate$$jnr$$StructByReferenceFromNativeConverter$$10
                    at jnr.ffi.provider.jffi.NativeClosureProxy$$impl$$0.<init>(Unknown Source)
                    ... 15 more

I'm not quite sure how to start diagnosing this or what the issue might be - any suggestions?

basshelal commented 3 years ago

Hmm, this looks awfully familiar to something I've experienced before.

What does the C API look like? We don't support passing or returning structs by value, see #262 , this took me a long time to figure out, though if that were the case you'd often get more cryptic errors than these. Still, for completeness, it would be good to add the C API too, at the very least just the PullImage function.

Try doing this:

internal interface PullImageProgressCallback {
    @Delegate
    fun invoke(userData: Pointer?, progress: Pointer? /* was  PullImageProgressUpdate */)
}

You can convert between Structs and Pointers with ease:

// Pointer to Struct
val pointer: Pointer? = getPointerFromSomewhere()
val progressUpdateStruct: PullImageProgressUpdate = PullImageProgressUpdate(Runtime.getSystemRuntime()).also { struct ->
    pointer?.also{ ptr -> struct.useMemory(ptr) }
}
// you already did the above in your secondary constructor in PullImageProgressUpdate šŸ˜

// Struct to Pointer
val newPointer: Pointer = Struct.getMemory(progressUpdateStruct)

// you can even change the values of an existing struct by simply re-assigning its memory
progressUpdateStruct.useMemory(getPointerFromSomewhere())
// above is possibly unsafe, but if safe, will change the backing values of the struct

Since you're using Kotlin, extension functions/properties are a good idea to make your code look nicer and reusable:

/** Represents the pointer backing the Struct */
inline var Struct.memory: Pointer
    set(value) = this.useMemory(value)
    get() = Struct.getMemory(this)

Now, when you use the callback, you can easily convert between Pointer and Structs.

Let me know if that at least works (even if its a little ugly), because if it does I think I know what's happening.

Otherwise, I would put a breakpoint on the line where you call your LibraryLoader.load() and follow the long and convoluted trail, everything happens at load time more or less, just make sure you use LibraryOption.LoadNow to ensure no lazy loading is happening.

You can read JNR-FFI's generated JVM bytecode by doing:

System.setProperty("jnr.ffi.compile.dump", "true")

This will show you the generated bytecode for every loaded function in your console. Be aware that many functions will be Java native functions so the usefulness of this bytecode is not great, but it can help to figure out what JNR-FFI is doing for you at load time. JNR-FFI can only work based on your mappings (hence my suggestion to change your mapping and see what happens), so your choice of mapping will have an impact on what gets generated and possibly the performance too.

Anyway I'm rambling a little, just trying to give as much detail as possible. Let me know if my suggestion works and we'll see where to go from there.

Hope this helps šŸ˜Š

charleskorn commented 3 years ago

Thanks for the quick and detailed response @basshelal!

Try doing this:

internal interface PullImageProgressCallback {
    @Delegate
    fun invoke(userData: Pointer?, progress: Pointer? /* was  PullImageProgressUpdate */)
}

Changing the parameter type to Pointer? like you suggested worked perfectly. What does that mean?

For reference, PullImage is defined like this:

extern PullImageReturn* PullImage(DockerClientHandle clientHandle, char* ref, PullImageProgressCallback onProgressUpdate, void* callbackUserData);

And PullImageProgressCallback looks like this:

typedef void (*PullImageProgressCallback) (void*, PullImageProgressUpdate*);
basshelal commented 3 years ago

@charleskorn Excellent!

What does that mean? Good question!

Most likely, JNR-FFI's closure/callback converter doesn't support Structs yet, instead it requires you (the caller) to do the conversion when needed.

This is not great but not awful, something works but it's a little inconvenient. The option for a convenient and semantically useful mapping is not supported. For reference we allow this C function:

struct MyStruct *my_function(struct MyStruct *mystruct);

To be mapped as either:

MyStruct my_function(MyStruct mystruct);

or:

Pointer my_function(Pointer mystruct);

The Struct Pointer conversion is done by JNR-FFI internally in the generated bytecode, but if you use the Pointer version you skip that and go straight to the Java native function if possible leading to some performance gain at the cost of you having to do the conversion.

Having the option is good, but the more semantic C-like mapping is better because Pointer could mean and contain anything and are inherently unsafe but Structs less so.

But the same doesn't seem to apply to closures/callbacks, that is Single Abstract Method interfaces with a @Delegate function like your PullImageProgressCallback interface.

This is good that you found and mentioned this because I ran into this many months ago but thought it was something wrong with my own library or mappings and never cared much about it but it seems this is a repeatable bug.

I'll see what I can do about this. In the meantime, just use the Pointer? variant and do the Pointer Struct conversion yourself where needed, you've technically gotten a free performance improvement by doing so šŸ˜„. It should work in every scenario, you can even probably wrap the conversion by doing something like this:

internal open class PullImageProgressCallbackImpl : PullImageProgressCallback {
    private val update: PullImageProgressUpdate = PullImageProgressUpdate(Runtime.getSystemRuntime())
    @Delegate
    override fun invoke(userData: Pointer?, progress: Pointer?) {
        progress?.also { update.useMemory(it) }
        // update is safe to use here, updated only if progress wasn't null
    }
}

Or something similar to that, this is only for convenience and readability and reuse, but do try and keep the Pointer Struct conversions to using the same single Struct, otherwise you're wasting JVM memory on each callback with creation for a whole new Struct.

Let me know if this helps or if you have any more questions about this šŸ˜Š

charleskorn commented 3 years ago

That works for me, thanks for all your help @basshelal!

headius commented 3 years ago

Is there something we can do to improve jnr-ffi here, or is this just a known complication of JNI and classloaders?

basshelal commented 3 years ago

@headius yeah I think this is an issue with the generated bytecode that is done for Closures to use Structs if provided by the mapping, not sure about classloaders or JNI, I'm quite confident this is a load-time mapping issue that we can fix.

I haven't yet found the exact cause of the issue (been busy with other stuff) but this theoretically shouldn't be a super difficult fix on our part, it's easy to replicate and I generally understand how it happens.