konsoletyper / teavm

Compiles Java bytecode to JavaScript, WebAssembly and C
https://teavm.org
Apache License 2.0
2.55k stars 260 forks source link

[WASM] I fail to pass an array by reference #907

Closed TrOllOchamO closed 2 months ago

TrOllOchamO commented 2 months ago

Hello ! I'm currently trying to pass an array back and forth in the browser between JS and WASM. After spending quite some time on it, I can't figure out why the data processing on the WASM side seems to not impact the data on the JS side.

Here is a simplification of what I'm trying to do :

JS side

    const baseArray = new Uint8Array([42, 69, 420]);
    const ptr = teavm.instance.exports.malloc(3);
    teavm.instance.exports.process(ptr);
    const resArray = new Uint8Array(teavm.memory.buffer, ptr + 12, 3);
    console.log(resArray); // Expect [43, 70, 421] but i get [42, 69, 420] =(

JAVA side

  @Export(name = "process")
  public static void process(@JSByRef byte[] imageData) {
    for (int i = 0; i < imageData.length; ++i) {
      imageData[i] += 1;
    }
  }

  @Export(name = "malloc")
  public static @JSByRef byte[] malloc(int nbBytesToAllocate) {
    return new byte[nbBytesToAllocate];
  }

What am I doing wrong here ? Is it a miss use of @JSByRef and the imageData argument is still copied ? Or is it a problem caused by how I'm accessing the WASM memory ? (the malloc function is exactly the same as the one in my code, it is not a choice but a bodge because I did not find how to allocate a java WASM array from the JS side)

Thanks by advance, --Barnabé

konsoletyper commented 2 months ago

First of all, @JSByRef does not work in WebAssembly backend.

Secondly, you can't just return array from exported memory. You'll end up with pointer to array object, not to its data. Anyway, GC would move this array eventually, so returned pointer will become invalid. If you attempt to write from JS side using this pointer, you'll likely corrupt Java heap. Without knowing what exactly you are trying achieve, I can't advice you how to avoid this malloc method.

konsoletyper commented 2 months ago

One suggestion is following: you assign an integer identifier to each allocated array. Your malloc method would return identifier, free takes identifier and removes array from mapping and access method takes identifier and returns address at the moment of the call. This address would be only valid until you call any other Java method from JavaScript.

TrOllOchamO commented 2 months ago

Ok thx, so I reformulate just to be sure I understand it correctly : What you call exported memory is an array dynamically allocated on the java side with the new keyword and returned to the JS side using a return, am I right ? And if so, you are telling me that keeping this kind of exported memory around on the JS side is a bad idea because it could be deallocated by the GC at any time while I'm still using it.

Actually, I don't really need any kind of complex allocator. What I really want to do is to pass some image data as an Uint8Array array to the WASM side, process it, and then I want to return the processed image data to the JS side.

So i want to do someting like this :

    const imageData = new Uint8Array([42, 69, 420]);
    const processedImageData = teavm.instance.exports.process(imageData);
    console.log(processedImageData) // [43, 70, 421]

Side note : Yup, I have seen that it was not directly a pointer to the data but a pointer to the array object. That is why I did this hacky + 12 offset here

  const OFFSET = 12;
  const resArray = new Uint8Array(teavm.memory.buffer, ptr + OFFSET, 3);

I thought the content data of the array object was always 12 bytes after the array object address, you are telling me that it is not always the case ? Thx again for your help, --Barnabé

konsoletyper commented 2 months ago

What about something like this?

public class ImageProcessor {
    private static byte[] buffer = new byte[4096];

    @Export
    public static void ensureCapacity(int capacity) {
        // more sophisticated allocation method can be used
        if (buffer.length < capacity) {
            buffer = new byte[capacity];
        }
    }

     @Export
     public static Address getBufferPointer() {
         return Address.of(buffer);
     }

     @Export
     public static void processImage(int size) {
         // process image of size bytes in the buffer
     }
}

then from JS you need something like:

let exports = teavm.instance.exports;
exports.ensureCapacity(3);
let imageData = new Uint8Array(exports.memory, exports.getBufferPointer(), 3);
imageData.set([42, 69, 255]);
exports.processImage(3);
// processImage could have triggered GC, so buffer address was changed.
imageData = new Uint8Array(exports.memory, exports.getBufferPointer(), 3);
console.log(imageData);

I thought the content data of the array object was always 12 bytes after the array object address, you are telling me that it is not always the case ?

Currently, it's not the case for double[] and long[]. Anyway, you can't rely on such implementation specifics. Although this has been quite consistent over years, there's no guarantee that this offset never changes. So you should use Address class to pass addresses to and from Java.

TrOllOchamO commented 2 months ago

Great ! That is exactly what I was looking for ! So to ensure that the GC don't deallocate the shared buffer we give it a static lifetime, that's a neat trick I did not think about x)

One last question, I was wondering if instead of writing to this intermediate buffer it was directly possible to read from and write to files located in the OPFS ?

With something that would look like so :

public class ImageProcessor {
     @Export
     public static void processImage(File srcFile, FileSystemWritableFileStream destFile) {
         byte[] imageData = srcFile.readAsArrayBuffer();
         byte[] processedImageData = // the result of processing imageData
         destFile.write(processedImageData);
     }
}

With a File and a FileSystemWritableFileStream that would be passed by the JS side like so :

        const resFileHandle = await root.getFileHandle(processedImageName, {create: true});
        const writableStream = await resFileHandle.createWritable();
        teavm.instance.exports.process(image, writableStream); // image beeing of type File()
        writableStream.close()
        resFileHandle // now contains the processed image

Does an API like this exist ? If yes, would it avoid unnecessary copies or would it be the same as doing the static array method above and then do the write on the JS side like so :

// process the image data by copying arrays
imageData = new Uint8Array(exports.memory, exports.getBufferPointer(), 3);
writableStream.write(imageData);
writableStream.close();
resFileHandle // now contains the processed image

Thx for all your precious help, --Barnabé

konsoletyper commented 2 months ago

One last question, I was wondering if instead of writing to this intermediate buffer it was directly possible to read from and write to files located in the OPFS ?

Perhaps you can find a way to communicate with this API from WebAssembly, but please note that TeaVM JS interop only works with JS backend, not with WebAssembly backend. Strictly speaking, WebAssembly itself can't work with JS APIs directly, unless you write a glue code on JS side. I know, that things might have changed after GC spec was added to WebAssembly, but TeaVM only compiles to old WebAssembly with plain memory.

TrOllOchamO commented 2 months ago

Ok ok I see,

Turn out it wasn't my last question ^^' I tried to follow what you suggested to me, but I still encounter 2 issues. Here is a simplified version of the code :

public class Client {
  private static int imageSize = 4096;
  private static byte[] buffer = new byte[4096];

  public static void main(String[] args) { // is this really important ?
  }

  @Export(name = "ensureCapacity")
  public static void ensureCapacity(int newImageSize) {
    // runtime error when this function body isn't commented
    // unreachable executed
    Client.imageSize = newImageSize;
    if (Client.buffer.length < newImageSize) {
      Client.buffer = new byte[newImageSize];
    }
  }

  @Export(name = "getBufferPointer")
  public static Address getBufferPointer() {
    return Address.ofData(Client.buffer);
  }

  @Export(name = "process")
  public static void process() {
    // this function seems to have no effect on the outcome
    for (int i = 0; i < Client.imageSize; ++i) {
      Client.buffer[i] += 1;
    }
  }
}
const exports = teavm.instance.exports;
const buffer = teavm.memory.buffer;

exports.ensureCapacity(3);  // cause an exeption when java body is not commented
let imageData = new Uint8Array(buffer, exports.getBufferPointer(), 3);
imageData.set([42, 69, 254]);
exports.process();  // seems to not modify the memory buffer
const processedImageData = new Uint8Array(buffer, ptr, 3);  // Expect [43, 70, 255] but get [42, 69, 254]

The first problem is, as mentioned in the comments, that calling ensureCapacity throw a runtime exception and I have no clue why. This is not a big deal for now since I can just delete the body of the function (I work with smalls tests files for the moment), but it will be annoying later.

The second problem I encounter is when I comment the last failing part, it still looks like the process function doesn't affect the memory buffer, and I don't know why either :')

I made a GitHub repo with the code needed to reproduce both problems here It might be easier to spot the issue

Btw, on a side note, is it possible to compile the java class without a main function ? Not that it is a real problem but since I will only use the produced WASM as a library it would make more sense to not have a main at all (but if I remove it, it won't compile).

Sorry for bothering you with all those questions, thank again for your help, --Barnabé

konsoletyper commented 2 months ago

I think the problem here is that you don't call main method. Currently, it's mandatory to call it, even if it's empty and does nothing. main method initializes JVM, so calling anything else without calling main method first is invalid. I know, this can sound strange, but the reason is that TeaVM was designed to compile whole applications, not modules. Recently, I added ability to build modules without main method to JS backend, but WebAssembly backend is non-priority for me, so did not have a chance to perform similar refactoring there.

TrOllOchamO commented 2 months ago

Amazing it works ! 🥳 Thank you so much for all your help ! --Barnabé