ubsicap / usx

Unified Scripture XML
32 stars 6 forks source link

Support XSD schema #45

Open Rolf-Smit opened 3 years ago

Rolf-Smit commented 3 years ago

Currently the schema's provided for USX 3.0 are in the Relax NG format, to me it makes a lot of sense to also make these available in the common XSD schema format.

I would not really mind how this is done, but it would make sense to make the Relax NG schema's so that when converting them with tools like Trang you would end up with a valid and workable XSD schema.

Currently when you try to convert the Relax NG schema using Trang you don't get a workable XSD schema.

jonbitgood commented 3 years ago

Here's an xsd schema for usx3 that I've been using for testing.

Rolf-Smit commented 3 years ago

@jonBitgood thanks for sharing!

But how do you, when generating code from the xsd, avoid errors such as:

org.xml.sax.SAXParseException: cos-element-consistent: Error for type '#AnonType_periph'. Multiple elements with name 'para', with different types, appear in the model group.

Or

org.xml.sax.SAXParseException: cos-nonambig: para and para (or elements from their substitution group) violate "Unique Particle Attribution". During validation against this schema, ambiguity would be created for those two particles.
jonbitgood commented 3 years ago

Hmmm... I've not worked with SAX but for these two you might be able to create a custom error handler that just returns without an error code. Alternatively, you could petition the good folks at UBS to split up para so that a different element name is used whenever the allowed attributes changes.

Rolf-Smit commented 3 years ago

It would indeed be nice if elements could have different names if they have different requirements. The current situation makes it really hard to write an XSD for. It can be done but it will be very loosely and non-strict (since you will have to combine requirements for elements with the same name).

If anyone could point me towards a Relax NG codegenerator that spits out Java POJO's that would be nice. But it seems within the Java community there doesn't seem to be a Relax NG codegen that is not from 20 years ago ;)

Rolf-Smit commented 3 years ago

@jonBitgood I created my own XSD file for now:

https://github.com/Rolf-Smit/BibleMultiConverter/blob/feature/usx-3.0/biblemulticonverter-schemas/src/main/resources/usx3.xsd

But it is by far not as strict as the Relax NG file due to the limitations of using the same element names for many different things. It works for unmarshalling and marshalling, but is no where near perfect when it comes to very strict verification of contents. In the end XSD is more suited for structure validation it seems, not really for verifying content/values.

I would love to see a bit more element tags being used in the next version of USX, to make it easier to write tooling for this format.