jameslan / libxml2-wasm

WebAssembly-based libxml2 javascript binding
https://jameslan.github.io/libxml2-wasm/
MIT License
4 stars 1 forks source link

Allow client customized virtual IO, required by xinclude and XSD inclusion #28

Closed jameslan closed 1 day ago

jameslan commented 2 months ago

PR #21 demonstrates the XSD inclusion.

Libxml uses callbacks for virtual IO, which provide the content of xml file when libxml needs a particular file.

See

stackoverflow question: https://stackoverflow.com/questions/13470166/add-additional-xsd-schemas-with-libxml2 libxml example: http://www.xmlsoft.org/examples/#InputOutput

jameslan commented 1 month ago

From the beginning of libxml2-wasm design, we tried to avoid direct file access, although webassembly provides its way to bridge the standard C file IO to javscript counterpart. The reason is simple, JS engines don't have an file system API for all platform: web browsers have FileSystemAPI while nodejs doesn't; nodejs has fs module while web browsers don't. This makes local file system support complicated and we may have to release multiple editions to support in different environments. So we decide to let libxml2-wasm to be IO free and leave all that to the client.

With the same reason, libxml's virtual IO seems to be the right solution: libxml2-wasm registers callbacks into libxml, and these libxml2-wasm defined callbacks forward the calls to client defined callbacks(most likely being organized in an interface/class), which do the actual IO(with files, http server etc).

It will be like,

API

Client registers the InputProvider with

function xmlRegisterInputProvider(provider: XmlInputProvider) 

Where XmlInputProvider is defined as,

interface XmlInputProvider {
    match(...): boolean;
    open(...): any;
    read(...): number;
    close(...): boolean;
}

Implementation

Expose libxml functions

Expose libxml struct

Define callback

Known issue

There's still some issues on libxml side:

In the contrast, IO callbacks register function is

int xmlRegisterInputCallbacks   (xmlInputMatchCallback matchFunc, 
                     xmlInputOpenCallback openFunc, 
                     xmlInputReadCallback readFunc, 
                     xmlInputCloseCallback closeFunc);

Without userData, to support multiple client-callbacks, we have to deal with the logic of managing thme within libxml2-wasm callback function.

jameslan commented 1 month ago

We could start from supporting only one synchronous provider

fennibay commented 1 month ago

@jameslan I gave this a try in the context of #21.

Changes attached: fshooks.patch

I realize now that we should not add the dependency to Node in libxml2.mts, but delegate it to the client. Also probably the provided implementations are not in the best quality. These things we can change easily.

OTOH, the real issue I'm having is that I cannot get libxml2 to call these callbacks. It just reports "No such file or directory", without calling my callbacks.

Maybe you can figure out what's going wrong.

jameslan commented 1 month ago

The C code won't directly call javascript code. It needs a wrapper to convert the javascript function into a C function pointer.

Emscripten has an addFunction to do that: https://emscripten.org/docs/porting/connecting_cpp_and_javascript/Interacting-with-code.html#calling-javascript-functions-as-function-pointers-from-c

You can refer to the use of xmlCtxtSetErrorHandler and the creation of errorCollector.

fennibay commented 1 month ago

I gave this a try in #21

Still I don't see libxml2 calling my callbacks. The error message remains the same: "Error: failed to load "./test/crossplatform/testfiles/book.xsd": No such file or directory"

fennibay commented 1 month ago

Sorry, silly mistake. I put the before function in the wrong describe block. 🤦‍♂️

Now I see that the callbacks are called. Can continue.