relaxng / jing-trang

Schema validation and conversion based on RELAX NG
http://www.thaiopensource.com/relaxng/
Other
228 stars 69 forks source link

Document SAX usage #263

Open oliviercailloux opened 3 years ago

oliviercailloux commented 3 years ago

May I suggest to include in the README (or elsewhere) an example use for using Jing through the standard SAX API? (Issue #21 provides some partial example, but unfortunately the link provided there is dead.)

Here is an attempt of mine, so far unsuccessful.

public static void validate(String documentId, InputStream relaxSchema) throws SAXException, ParserConfigurationException, IOException {
  ErrorHandler errorHandler = new DraconianErrorHandler();

  System.setProperty(SchemaFactory.class.getName() + ":" + XMLConstants.RELAXNG_NS_URI, "com.thaiopensource.relaxng.jaxp.XMLSyntaxSchemaFactory");
  SchemaFactory schemaFactory = SchemaFactory.newInstance(XMLConstants.RELAXNG_NS_URI);
  schemaFactory.setErrorHandler(errorHandler);
  Schema schema = schemaFactory.newSchema(new StreamSource(relaxSchema));

  SAXParserFactory factory = SAXParserFactory.newInstance();
  factory.setNamespaceAware(true);
  factory.setSchema(schema);

  SAXParser parser = factory.newSAXParser();
  XMLReader reader = parser.getXMLReader();
  reader.setErrorHandler(errorHandler);
  reader.parse(documentId);
}

Then test as follows. Using docbook howto.xml and docbook.rng.

@Test
void testValidSax() throws Exception {
  try (InputStream rng = DocBookUtils.class.getResource("docbook.rng").openStream()) {
      assertDoesNotThrow(() -> DocBookUtils.validate(DocBookUtilsTests.class.getResource("docbook howto.xml").toString(), rng));
  }
}

The above test yields:

org.xml.sax.SAXParseException; systemId: file:/home/…/docbook%20howto.xml; lineNumber: 13; columnNumber: 31; attribute "xmlns" not allowed here; expected attribute "annotations", "arch", "audience", "class", "condition", "conformance", "dir", "label", "linkend", "os", "outputformat", "prefix", "property", "remap", "resource", "revision", "revisionflag", "role", "security", "status", "typeof", "userlevel", "vendor", "version", "vocab", "wordsize", "xl:actuate", "xl:arcrole", "xl:from", "xl:href", "xl:label", "xl:role", "xl:show", "xl:title", "xl:to", "xl:type", "xml:base", "xml:id", "xml:lang" or "xreflabel"
    at com.thaiopensource.relaxng.jaxp.ValidatorHandlerImpl.check(ValidatorHandlerImpl.java:148)
    at com.thaiopensource.relaxng.jaxp.ValidatorHandlerImpl.startElement(ValidatorHandlerImpl.java:68)
    at java.xml/com.sun.org.apache.xerces.internal.jaxp.JAXPValidatorComponent$XNI2SAX.startElement(JAXPValidatorComponent.java:419)
    at java.xml/com.sun.org.apache.xerces.internal.jaxp.JAXPValidatorComponent.startElement(JAXPValidatorComponent.java:182)
    at java.xml/com.sun.org.apache.xerces.internal.impl.dtd.XMLDTDValidator.startElement(XMLDTDValidator.java:731)
    at java.xml/com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanStartElement(XMLNSDocumentScannerImpl.java:374)
    at java.xml/com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl$NSContentDriver.scanRootElementHook(XMLNSDocumentScannerImpl.java:613)
    at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:3063)
    at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(XMLDocumentScannerImpl.java:836)
    at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:605)
    at java.xml/com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:112)
    at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:534)
    at java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:888)
    at java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:824)
    at java.xml/com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
    at java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1141)
    at java.xml/com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:647)
    at io.github.oliviercailloux.xml_utils.DocBookUtils.validate(DocBookUtils.java:128)
    at io.github.oliviercailloux.xml_utils.DocBookUtilsTests.lambda$0(DocBookUtilsTests.java:30)
    at org.junit.jupiter.api.AssertDoesNotThrow.assertDoesNotThrow(AssertDoesNotThrow.java:50)
    ... 70 more

But that file does validate against that schema when not going through SAX (using com.thaiopensource.validate.Schema schema = new AutoSchemaReader().createSchema(relaxSchema, countingErrorProperties); Validator validator = schema.createValidator(countingErrorProperties); contentHandler = validator.getContentHandler(); xmlReader = ResolverFactory.createResolver(PropertyMap.EMPTY).createXMLReader(); and so on…)

Could you perhaps indicate what the recommended usage is for using Jing through the SAX interface?

Thank you for this useful library.

opeongo commented 2 years ago

I am also interested in this issue. My approach is basically the same as in your code.

I have found that factory.setSchema(schema); doesn't seem to do anything. What does seem to apply the schema is this line:

         reader.setProperty("http://apache.org/xml/properties/schema/external-noNamespaceSchemaLocation",
                            schemaFilename);

When I add this line the error message that I get after the parse has begun is:

s4s-elt-schema-ns: The namespace of element 'grammar' must be from the schema namespace, 'http://www.w3.org/2001/XMLSchema'.

If I comment out the line

 reader.setFeature("http://apache.org/xml/features/validation/schema", true);

then the error becomes

Document is invalid: no grammar found.

My guess is that the SAXParserFactory.newInstance(); method is returning an instance of a XML Schema parser, but what is needed is a RelaxNG-based parser. I have looked through the jing-trang source code but I cannot find an implementation of an implementation of SAXParseFactory which is what appears to be required to work with SAX, so maybe this plumbing was never developed? I'm just guessing here.

Here is a more complete example:

      System.setProperty(SchemaFactory.class.getName() + ":" + XMLConstants.RELAXNG_NS_URI, "com.thaiopensource.relaxng.jaxp.XMLSyntaxSchemaFactory"); 
      SchemaFactory sf = SchemaFactory.newInstance(XMLConstants.RELAXNG_NS_URI);
      Schema schema = sf.newSchema(scf);
      SAXParserFactory factory = SAXParserFactory.newInstance();
       factory.setSchema(schema);

      try {
     SAXParser parser = factory.newSAXParser ();
         System.err.println("parser schema="+parser.getSchema());
     XMLReader reader = parser.getXMLReader ();
         reader.setFeature("http://xml.org/sax/features/validation", true);
         reader.setFeature("http://apache.org/xml/features/validation/schema", true);
         reader.setProperty("http://apache.org/xml/properties/schema/external-noNamespaceSchemaLocation",
                            schemaFilename);
         reader.setFeature("http://apache.org/xml/features/xinclude", true);
         reader.setFeature("http://xml.org/sax/features/namespaces", true);
         reader.setFeature("http://apache.org/xml/features/xinclude/fixup-base-uris", false);
         reader.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true);

     reader.setErrorHandler   (this);
     reader.setContentHandler (this);
     reader.setEntityResolver (this);

         reader.parse (new InputSource(input));

As a workaround I am using trang to convert rng to xsd, and then validation with the built in XML Schema validator works just fine.

I would be nice to only work with RelaxNG, and not have to convert to xsd, but it's not a show stopper.