jameslan / libxml2-wasm

WebAssembly-based libxml2 javascript wrapper
https://jameslan.github.io/libxml2-wasm/
MIT License
11 stars 2 forks source link

feature request: convenient way to map many URL prefixes to different local directories #55

Open SKalt opened 1 week ago

SKalt commented 1 week ago

I'm trying to use this library to validate an XML schema that has many xs:include references to different URLs. I've saved local copies of each of the referenced *.xsds in a local directory:

xsds/
  schema-a/**/*.xsd
  schema-b/**/*.xsd

I'd like a convenient way to construct an object that implements the XmlInputProvider interface and handles different hosts.

Here's what I ended up implementing:

import path from 'node:path'
import { fsInputProviders } from 'libxml2-wasm/lib/nodejs.mjs';

class UrlMapper {
  /**
   * @param {Record<string, string>} mapping url prefixes to file paths
   */
  constructor(mapping) {
    /** @type Map<URL, URL> */
    this.mapping = new Map();
    for (let prefix in mapping) {
      if (prefix.startsWith('.')) prefix = 'file://' + prefix;
      const prefixUrl = new URL(prefix);
      let mapped = mapping[prefix];
      if (mapped.startsWith('.')) {
        mapped = path.resolve(mapped);
      }
      if (!mapped.startsWith('file://')) mapped = 'file://' + mapped;
      const u = new URL(mapped);
      this.mapping.set(prefixUrl, u);
    }
  }
  /**
   * @param {string} _url
   */
  mapUrl(_url) {
    if (_url.startsWith('.')) _url = 'file://' + _url;
    const url = new URL(_url);
    for (const [prefix, replacement] of this.mapping) {
      if (prefix.origin !== url.origin) continue;
      if (!url.pathname.startsWith(prefix.pathname)) continue;
      let _rest = url.pathname.slice(prefix.pathname.length);
      if (_rest.startsWith('/')) _rest = _rest.slice(1);
      const resolved =
        'file://' +
        path.resolve(path.join(replacement.host + replacement.pathname, _rest));
      return resolved;
    }
  }
  /**
   * @param {string} url
   * @returns {boolean} whether the url can be handled by this mapper
   */
  match(url) {
    let _url = this.mapUrl(url);
    if (!_url) return false;
    const result = fsInputProviders.match(_url);
    return result;
  }
  /**
   * @param {string} url
   * @returns {number|undefined}
   */
  open(url) {
    const fileName = this.mapUrl(url);
    if (!fileName) return undefined;
    return fsInputProviders.open(fileName);
  }
  /**
   * @param {number} fd File descriptor
   * @param {Uint8Array} buf Buffer to read into, no more than its byteLength shall be read into.
   * @returns number of bytes actually read, -1 on error
   */
  read(fd, buf) {
    return fsInputProviders.read(fd, buf);
  }
  /**
   * @param {number} fd
   */
  close(fd) {
    return fsInputProviders.close(fd);
  }
}

const myXmlProvider = new UrlMapper({
  "http://foo.example.com/schema-a/": "./xsds/schema-a/",
  "http://bar.example.com/schema-b/": "./xsds/schema-b/",
})

import {
  XmlDocument,
  xmlRegisterInputProvider,
  XsdValidator,
} from 'libxml2-wasm';
import { readFileSync } from 'node:fs'

const doc = XmlDocument.fromBuffer(readFileSync("./xsds/schema-a/schema.xsd"))
xmlRegisterInputProvider(myXmlProvider)
const schema = XsdValidator.fromDoc(doc) 
jameslan commented 6 days ago

Let me summarize your requirements:

There may be some options you can use to make the code simpler.

Modify the master XSD file to import/include with a relative path.

And by providing the path of the master XSD in the options.url parameter of XmlDocument.fromBuffer, libxml2 will figure out the actual paths of imported/included XSD files, and the provider in nodejs.mjs could handle it.

This requires minimal code, but you need to modify the XSD files.

Build a map of URL and file, not just a prefix.

The code will be like

const prefixMapping = new {
  "http://foo.example.com/schema-a": "./xsds/schema-a",
  "http://bar.example.com/schema-b": "./xsds/schema-b",
}

const urlMapping = Object.fromEntry(
    Object.entries(prefixMapping)
        .flatMap((url, dir) =>
            globSync(dir + "/**/*.xsd") // find all xsd file
                .map(p => [`${url}/${path.relative(dir, p)}`, p]) // url -> local path
    )
)

then your provider could be like

match(url) {
    return Object.hasOwn(urlMapping, url)
}

open(url) {
    return fs.openSync(urlMapping[url]);
}
...

An alternative option can be caching XSD content in memory Buffer, so it won't need to read from disk every time. It may benefit if your app reads XSD multiple times.

SKalt commented 6 days ago

Modifying the source xsds isn't an option for me since I'm trying to keep them remain byte-for-byte identical some authoritative schema documents.

The latter example is great! I'd be interested in how the caching you proposed would work, too.

jameslan commented 6 days ago

I’ve come to realize that the task is more intricate than I initially anticipated.

However, since libxml2 may read the file in multiple truncks, you need to maintain a context object for the file pointer. Additionally, the WASM library cannot hold references to JavaScript objects, so you must use a number to represent the context object.

Essentially, you need a Map<number, Context> object to manage the file pointer.

It would be convenient if libxml2-wasm provided a helper function for handling such scenarios. 🤔