lestrrat-go / libxml2

Interface to libxml2, with DOM interface
MIT License
230 stars 56 forks source link

adding test data for #67 #68

Closed johnnybubonic closed 3 years ago

johnnybubonic commented 4 years ago

As per request in #67, this PR offers test data that fully validates with xmllint but will fail currently:

$ xmllint -noout -schema schema/projects/go_libxml2_local.xsd go_libxml2_local.xml 
go_libxml2_local.xml validates
$ xmllint -noout -schema http://schema.xml.r00t2.io/projects/go_libxml2.xsd go_libxml2_remote.xml 
go_libxml2_remote.xml validates
lestrrat commented 4 years ago

This is a memo, mainly for myself, so I can remember when I work on this in bits and pieces.

libxml2 has parser context objects, which carry extra metadata which can be used as hints during parsing, along with other parse states. For a given target (xml doc, schema doc, etc) there are specific types of parser context constructors, which does some magic underneath

I tend to use xmlXXXNewMemParserCtxt, which is used to parse byte sequences in memory. A byte sequence in memory obviously has no idea of the file system state, in this case where we think we are / the document is in the file system. This is where the problem arises from.

We basically need to give the parser context a hint as to where the document came from, or give it a map of things that are referenced in the document via a catalog. Here are our options

lestrrat commented 4 years ago

@johnnybubonic After some research, I found that using xsd.ParseFromFile would work for your case right now.

Allowing relative path resolution in XML documents given as a byte sequence probably requires a subtle API change, and I don't immediately know how hard it will be yet -- I will be researching in my spare time. So if you need to fix something for your project right now, please punt it by using xsd.ParseFromFile

johnnybubonic commented 4 years ago

@lestrrat Got it, thanks for your research! I'm more eager about the remote inclusions as that's typically how XSDs are generally referenced in the "wild", in my experience, and it'd remove the need to distribute local versioned copies in builds, but all things in due time of course. Thank you again for your work on this; it's by far the most mature and complete Schema-aware implementation out there for Golang that I've come across, and I certainly appreciate the effort that goes into it.