Open vincent-zurczak opened 9 years ago
Quasi-duplicate of https://github.com/validator/validator.github.io/issues/11
Indeed. :smiley: And I think others could be interested in having a more direct pointer about this in the readme.
Related: #452
Here's a basic validation code snippet:
String html = ...
EmbeddedValidator validator = new EmbeddedValidator();
validator.setOutputFormat( EmbeddedValidator.OutputFormat.GNU );
try {
String output = validator.validate( new ByteArrayInputStream( html.getBytes( StandardCharsets.UTF_8 ) ) );
if (!output.isEmpty())
throw new Exception( output ); //validation failed
} catch (SAXException e) {
throw new Exception( "Cannot validate html", e );
}
The Text
output format has some boilerplate text if valid.
I have looked in detail at the source code of SimpleDocumentValidator
in view of publishing a simple-to-use wrapper of Nu that can be used to validate HTML as a Java library; and have some working unit tests (using very simple example documents).
Are you willing to provide some help in order to help me understand the code?
My first question is simple. At SO you gave a simple way of using the validator directly through jing (as embedded in vnu.jar
), that boils down to a few simple lines of code. This seems to work. OTOH, “unwrapping” the code from SimpleDocumentValidator
leads to some apparently much more complicated usage, involving setting system properties, using cascading schema instances and validators and xml readers interacting with a SourceCode
instance, … So my question is: how do these two approaches differ? Can I effectively validate HTML documents using the first, much simpler approach, or do I risk missing errors?
Hi,
The readme is complete if one wants to use the project as an executable. It is harder to use when you want to use its validation capabilities as a library (e.g. in unit tests). Or maybe I missed something.
Otherwise, I created a Gist to document this case. It is widely inspired from the command line validator. But it might help people to gain time if it was added in the project's readme.
PS: I spent the entire afternoon looking at solutions to validate HTML 5 pages in Java, and your solution is the best I found.