Provide validator - Githubissues

nichtich commented 5 years ago

The documentation on HTML patterns states

Ultimately, well-formed and valid HTML - along with accompanying RDFa, Turtle, JSON-LD, TriG etc - is the only requirement here

This is only true in theory. In practice dokieli.js expects some patterns such as Polyglot Markup (?), Outline with title, main > article, and section etc. I stumbled upon this with a document that uses a flat list of h1, h2... so the table of contents could not be generated properly.

It would be helpful to have a validator script that checks for

well-formed and valid HTML
expected outline (e.g. section elements)
well-formed embedded RDF (if given)
meaningful embedded RDF (e.g. no unknown namespaces)
best-practice (e.g. aside should appear as the last element node in section)

The validation rules do not need to be very strict but even "well-formed and valid HTML+RDFa" must be validated. All data should be expected as not conforming to any standards unless the conformance is actually checked.

csarven commented 5 years ago

I think it goes a bit without saying that dokieli will consume and try to accommodate certain patterns out there, but it can't turn any arbitrary pattern into something useful. We can of course improve recognising commonly used patterns (eg. the example with flat headings? IIRC, the HTML spec has an algorithm for an outline that could be perhaps implemented here). Nevertheless, it is not an all or nothing situation, so some of the functions can still work in dokieli eg. while the webpage may have garbage HTML, we can still annotate I think.

As for what it generates, it has its own patterns with the intention of having some consistency and reuse.

I've been hesitant to bring a validator to the mix for two reasons:

there are going to be things that's outside of dokieli's knowledge that the author wants, so dokieli shouldn't interfere.
far majority of the HTML pages are probably invalid.

In any case, perhaps I've misunderstood what you mean by a validator script. Is that for what dokieli consumers and/or generates?

I generally like the idea of running through a canonical functions that tells us what should go where and how. I like the example with aside. It makes me think of things like DO.C.DocumentItems where it has an order for certain common blocks, and dokieli looks that up when it needs to know where to insert an item. I think this sort of thinking is what you're raising, right? I think the patterns in dokieli are generally consistent, but perhaps this is where the templating stuff can help.

Perhaps the wording in "Ultimately, well-formed and valid HTML - along with accompanying RDFa, Turtle, JSON-LD, TriG etc - is the only requirement here" is not accurate. Instead of "requirement", I think I meant "goal" or "aim". That paragraph needs a rewrite.

nichtich commented 5 years ago

Well, every unexpected behaviour of dokeli.js could either a bug or a requirement to the document that's being processed. Maybe don't call it validator but linter to check for common pitfalls (such as not using section tags), and suspicious pieces of HTML+RDFa.

DO.C.DocumentItems where it has an order for certain common blocks, and dokieli looks that up when it needs to know where to insert an item. I think this sort of thinking is what you're raising, right?

yes, but I'd also catch invalid HTML and invalid or unrecognizable RDF (such as typos in namespace prefixes, up to undefined RDF properties). There are standard tools to do so (e.g. https://validator.github.io/validator/) but authors should not be required to find how to find, setup, and run these tools.

csarven commented 5 years ago

linter to check for common pitfalls

catch invalid HTML and invalid or unrecognizable RDF

Anything that's created through the dokieli UI is intended to be valid.

The Save operation for example takes what's in the DOM and normalises an HTML before saving. Save As on the other hand uses source HTML as is.

What should change? I'm not against a linter, but trying to understand if and where it can fit in. I'm missing the "why" I think.

csarven commented 4 months ago

Unit tests will cover the generated output of the editing / annotating features, i.e., making sure the data is based on some schema, validates, and so on. That aside, there won't be separate tooling to validate further.

linkeddata / dokieli

Provide validator #287