jnr / jnr-ffi

Java Abstracted Foreign Function Layer
Other
1.26k stars 157 forks source link

String parameter of a callback function gets messed when passed from DLL (Rust) to Java #335

Open Revxrsal opened 1 year ago

Revxrsal commented 1 year ago

o/ I'm migrating from JNA to JNR, and almost everything works fine. However, I've run into a really odd bug when using callback functions. I've built a minimal project that reproduces it. The JNA equivilent works fine.

Note: My native library is written in Rust

How to reproduce

  1. Install the native library (see platform artifacts)
  2. Load the library
  3. Try to call it from Java.

Java:

import jnr.ffi.LibraryLoader;
import jnr.ffi.annotations.Delegate;

public class Main {

    public interface Natives {

        void simple_callback(SimpleCallback callback);

        interface SimpleCallback {
            @Delegate
            void invoke(String value);
        }

        static Natives load() {
            return LibraryLoader
                    .create(Natives.class)
                    .load("<path to the library>");
        }
    }

    public static void main(String[] args) {
        Natives natives = Natives.load();
        for (int i = 0; i < 10; i++) {
            natives.simple_callback(System.out::println);
        }
    }
}

Rust:

use std::ffi::{c_char, CString};
use std::mem;

/// Converts a Rust string to a Java string
pub fn to_java_string(string: &str) -> *const c_char {
    let cs = CString::new(string.as_bytes()).unwrap();
    let ptr = cs.as_ptr();
    // Tell Rust not to clean up the string while we still have a pointer to it.
    // Otherwise, we'll get a segfault.
    mem::forget(cs);
    ptr
}

#[no_mangle]
extern fn simple_callback(callback: extern fn(*const c_char)) {
    let value = "Any string value";
    callback(to_java_string(&value));
}

The output:

Any string value
Any string value
Any string value lG�|  ��� $?      dRTypeCache    |  ��� %� �|
Any string value
Any string value
�hNG�|  �,0�|
Any string value
Any string value |��|  A��|�
Any string value
Any string value

(The corruption is different every time) Any idea what could be causing this?

Hyperkopite commented 5 months ago

Same issue here. JNA returns normal result but JNR returns with a small number of corrupted data, involking the same C function.

JNR code:

import jnr.ffi.LibraryLoader;

public interface JNRUtils {
    JNRUtils INSTANCE = LibraryLoader.create(JNRUtils.class).load("QGram");

    public double calc_similarity(String str1, String str2, int q);

    public String purge_duplicated_spaces(String s);
}

C code:

char *purge_duplicated_spaces(char *str)
{
    re_length_t re_match_start;
    struct re_context *re_ctx = (struct re_context *)calloc(1, sizeof(struct re_context));

    while (true)
    {
        re_ctx->match_length = 0;
        re_match(re_ctx, "\\s+", text_args(str), &re_match_start);
        if (re_ctx->match_length == 0)
        {
            free(re_ctx);
            break;
        }

        delete_sub_str(str, re_match_start, re_ctx->match_length);  // Another function to delete some substring from a string
    }

    int p = 0;
    while(str[p] != '\0')
    {
        if (str[p] == '\1') {
            str[p] = ' ';
        }
        p++;
    }

    return str;
}
rorueda commented 3 months ago

I assume the conversion is done by StringResultConverter. Looking at it, it seems Java default charset is used to determine the width of the string termination.

It is just a guess, but maybe your default charset results in a width > 1.

Trivaxy commented 1 month ago

I got bit by this bug as well, but fortunately there's a workaround.

I assume the conversion is done by StringResultConverter. Looking at it, it seems Java default charset is used to determine the width of the string termination.

It is just a guess, but maybe your default charset results in a width > 1.

Yeah, this seems to be issue. On my machine, the default charset is windows-1252 which has a terminator width of 1, but StringUtil#terminatorWidth will say it's 4 because that's its fallback when it doesn't recognize the charset, which in turn leads to nasty bugs like this. JNR should probably add windows-1252 to the cases it checks (or, better yet, throw an exception when it doesn't know the termination width of the charset).

You can work around this problem by telling JNR to use UTF-8 encoding for the String parameter in the callback, e.g.

public interface WrenWriteFn {
    @Delegate
    void invoke(Pointer vm, @Encoding("utf8") String text);
}

This fixed the issue for me, at least.