validator / validator

Nu Html Checker – Helps you catch problems in your HTML/CSS/SVG
https://validator.github.io/validator/
MIT License
1.69k stars 271 forks source link

Add library usage in the readme #71

Open vincent-zurczak opened 9 years ago

vincent-zurczak commented 9 years ago

Hi,

The readme is complete if one wants to use the project as an executable. It is harder to use when you want to use its validation capabilities as a library (e.g. in unit tests). Or maybe I missed something.

Otherwise, I created a Gist to document this case. It is widely inspired from the command line validator. But it might help people to gain time if it was added in the project's readme.

PS: I spent the entire afternoon looking at solutions to validate HTML 5 pages in Java, and your solution is the best I found.

cvrebert commented 9 years ago

Quasi-duplicate of https://github.com/validator/validator.github.io/issues/11

vincent-zurczak commented 9 years ago

Indeed. :smiley: And I think others could be interested in having a more direct pointer about this in the readme.

xfq commented 7 years ago

Related: #452

cdalexndr commented 2 years ago

Here's a basic validation code snippet:

String html = ...
EmbeddedValidator validator = new EmbeddedValidator();
validator.setOutputFormat( EmbeddedValidator.OutputFormat.GNU );
try {
    String output = validator.validate( new ByteArrayInputStream( html.getBytes( StandardCharsets.UTF_8 ) ) );
    if (!output.isEmpty())
        throw new Exception( output ); //validation failed
} catch (SAXException e) {
    throw new Exception( "Cannot validate html", e );
}

The Text output format has some boilerplate text if valid.

oliviercailloux commented 2 years ago

I have looked in detail at the source code of SimpleDocumentValidator in view of publishing a simple-to-use wrapper of Nu that can be used to validate HTML as a Java library; and have some working unit tests (using very simple example documents).

Are you willing to provide some help in order to help me understand the code?

My first question is simple. At SO you gave a simple way of using the validator directly through jing (as embedded in vnu.jar), that boils down to a few simple lines of code. This seems to work. OTOH, “unwrapping” the code from SimpleDocumentValidator leads to some apparently much more complicated usage, involving setting system properties, using cascading schema instances and validators and xml readers interacting with a SourceCode instance, … So my question is: how do these two approaches differ? Can I effectively validate HTML documents using the first, much simpler approach, or do I risk missing errors?