jameslan / libxml2-wasm

WebAssembly-based libxml2 javascript wrapper
https://jameslan.github.io/libxml2-wasm/
MIT License
10 stars 2 forks source link

fix: include statements in XSDs are not handled properly #21

Closed fennibay closed 1 month ago

fennibay commented 4 months ago

Problem

xsd:include statements lead to the following error:

Element '{http://www.w3.org/2001/XMLSchema}include': Failed to load the document 'author.xsd' for inclusion.

Desired behavior:

  1. Absolute paths should work
  2. Relative paths should work based on XSD's current location (not the XML's location or the current working directory)

Tried workarounds

  1. Prefix ./ to the path.
  2. Try giving an absolute path, e.g. file:///c:/path-to-included-xsd/author.xsd // I also tried different syntaxes
  3. When loading the XSD, switching the working directory to where all XSDs are.

Solution ideas

It looks like libxml2 is capable of handling this, because I didn't observe this problem with libxmljs2. So the root cause must be somewhere in the libxml2-wasm's integration with libxml2.

I propose to start with the unit tests to reproduce the issue and debug from there.

I had issues setting up my environment (I got stuck at emscripten step), so I couldn't test this PR locally. Apologies.

fennibay commented 4 months ago

@jameslan could you maybe give an initial assessment if this would be a small or big problem to fix? I would also appreciate some pointers on how to tackle this.

jameslan commented 4 months ago

libxml has a "registry" for xml's name/url so that it could know what content to use when it is included.

libxml2-wasm doesn't support it yet.

There's a name/url parameter of parsing memory in libxml, which is used by libxml2-wasm, but right now we hard coded as null.

jameslan commented 3 months ago

Correction: the url parameter of parsing memory API is for the namespace of the XML, not for document inclusion.

Libxml uses callbacks for virtual IO, which provide the content of xml file when libxml needs a particular file.

See

fennibay commented 3 months ago

@jameslan maybe if you could tell me where I should start, I could have a go :-)

I think the issue starts from the fact that I can instantiate the XsdValidator via XmlDocument, and XmlDocument only via fromBuffer or fromString. Then libxml2 doesn't have any context of this XSD's directory and so no way of finding the included XSD.

If that's the case, I would try to expose the path-related methods of libxml2 and use them directly from a new XmlDocument method such as fromFile. This way libxml2 would have a way for looking for files in the path.

That is unless WASM constrains the access of libxml2 to the filesystem. Then we need a solution to that, perhaps what you explain with the virtual IO callbacks, then we could still access the filesystem for libxml2 from Javascript.

fennibay commented 3 months ago

I attempted to fix the issue by providing a fromFile loader, so the XSD Document will have the path context, and can hopefully resolve further includes.

But this requires the IO handling to be handled via callbacks, so #28 should be fixed first.

Also this bases on #29, so it's better to merge that one first.

fennibay commented 3 months ago

@jameslan this is now finally working, and also covers #28. Thank you very much for the hints.

Please review, what can we improve.