Closed nichtich closed 4 months ago
I think it goes a bit without saying that dokieli will consume and try to accommodate certain patterns out there, but it can't turn any arbitrary pattern into something useful. We can of course improve recognising commonly used patterns (eg. the example with flat headings? IIRC, the HTML spec has an algorithm for an outline that could be perhaps implemented here). Nevertheless, it is not an all or nothing situation, so some of the functions can still work in dokieli eg. while the webpage may have garbage HTML, we can still annotate I think.
As for what it generates, it has its own patterns with the intention of having some consistency and reuse.
I've been hesitant to bring a validator to the mix for two reasons:
there are going to be things that's outside of dokieli's knowledge that the author wants, so dokieli shouldn't interfere.
far majority of the HTML pages are probably invalid.
In any case, perhaps I've misunderstood what you mean by a validator script. Is that for what dokieli consumers and/or generates?
I generally like the idea of running through a canonical functions that tells us what should go where and how. I like the example with aside
. It makes me think of things like DO.C.DocumentItems where it has an order for certain common blocks, and dokieli looks that up when it needs to know where to insert an item. I think this sort of thinking is what you're raising, right? I think the patterns in dokieli are generally consistent, but perhaps this is where the templating stuff can help.
Perhaps the wording in "Ultimately, well-formed and valid HTML - along with accompanying RDFa, Turtle, JSON-LD, TriG etc - is the only requirement here" is not accurate. Instead of "requirement", I think I meant "goal" or "aim". That paragraph needs a rewrite.
Well, every unexpected behaviour of dokeli.js could either a bug or a requirement to the document that's being processed. Maybe don't call it validator but linter to check for common pitfalls (such as not using section
tags), and suspicious pieces of HTML+RDFa.
DO.C.DocumentItems where it has an order for certain common blocks, and dokieli looks that up when it needs to know where to insert an item. I think this sort of thinking is what you're raising, right?
yes, but I'd also catch invalid HTML and invalid or unrecognizable RDF (such as typos in namespace prefixes, up to undefined RDF properties). There are standard tools to do so (e.g. https://validator.github.io/validator/) but authors should not be required to find how to find, setup, and run these tools.
linter to check for common pitfalls
catch invalid HTML and invalid or unrecognizable RDF
Anything that's created through the dokieli UI is intended to be valid.
The Save operation for example takes what's in the DOM and normalises an HTML before saving. Save As on the other hand uses source HTML as is.
What should change? I'm not against a linter, but trying to understand if and where it can fit in. I'm missing the "why" I think.
Unit tests will cover the generated output of the editing / annotating features, i.e., making sure the data is based on some schema, validates, and so on. That aside, there won't be separate tooling to validate further.
The documentation on HTML patterns states
This is only true in theory. In practice
dokieli.js
expects some patterns such as Polyglot Markup (?), Outline withtitle
,main > article
, andsection
etc. I stumbled upon this with a document that uses a flat list ofh1
,h2
... so the table of contents could not be generated properly.It would be helpful to have a validator script that checks for
section
elements)aside
should appear as the last element node insection
)The validation rules do not need to be very strict but even "well-formed and valid HTML+RDFa" must be validated. All data should be expected as not conforming to any standards unless the conformance is actually checked.