Closed johnnybubonic closed 3 years ago
This is a memo, mainly for myself, so I can remember when I work on this in bits and pieces.
libxml2 has parser context
objects, which carry extra metadata which can be used as hints during parsing, along with other parse states. For a given target (xml doc, schema doc, etc) there are specific types of parser context constructors, which does some magic underneath
I tend to use xmlXXXNewMemParserCtxt
, which is used to parse byte sequences in memory. A byte sequence in memory obviously has no idea of the file system state, in this case where we think we are / the document is in the file system. This is where the problem arises from.
We basically need to give the parser context a hint as to where the document came from, or give it a map of things that are referenced in the document via a catalog. Here are our options
xmlSchemaNewParserCtxt(filename)
so the constructor does the magicxmlSchemaNewDocParserCtx(document)
, and pass a document with its doc->URL
set to a suitable value (TODO: since we don't know what the name of the file is, I don't yet know what the correct value is)@johnnybubonic After some research, I found that using xsd.ParseFromFile
would work for your case right now.
Allowing relative path resolution in XML documents given as a byte sequence probably requires a subtle API change, and I don't immediately know how hard it will be yet -- I will be researching in my spare time. So if you need to fix something for your project right now, please punt it by using xsd.ParseFromFile
@lestrrat Got it, thanks for your research! I'm more eager about the remote inclusions as that's typically how XSDs are generally referenced in the "wild", in my experience, and it'd remove the need to distribute local versioned copies in builds, but all things in due time of course. Thank you again for your work on this; it's by far the most mature and complete Schema-aware implementation out there for Golang that I've come across, and I certainly appreciate the effort that goes into it.
As per request in #67, this PR offers test data that fully validates with xmllint but will fail currently:
test/go_libxml2_local.xml
validates againsttest/schema/go_libxml2_local.xsd
using a relative path to the XML document, with nested relative include directives (all includes should be present in the PR).test/go_libxml2_remote.xml
validates against a remotely sourced schema (http://schema.xml.r00t2.io/projects/go_libxml2.xsd) as referenced by itsxsi:schemaLocation
attribute (obviously feel free to host elsewhere and change this if desired; it uses the same exact schema set as included in this PR):