Open tomschr opened 7 years ago
I guess adding this to GeekoDoc might be the better idea for the time being...
For an idea of what we could do with Schematron directly in GeekoDoc, see: https://github.com/openSUSE/suse-xsl/issues/222 . There is quite a number of cases associated with table markup and you generally notice those issues currently when going the step from FO->PDF because FOP balks.
This is also not really style checker territory because it really leads to hard errors that are not caught by current validation methods. Then again, if we have more such cases, we could move some checks from the style checker to GeekoDoc.
DocBook >= 5.0 brings also some (ISO) Schematron files, see /usr/share/xml/docbook/schema/sch/5.1/docbook.sch
. For example, it checks if footnote
contains another footnote
child.
However, it seems, oXygen is not that happy with the schema. It shows this error message:
cvc-complex-type.3.2.2: Attribute 'name' is not allowed to appear in element 's:pattern'.
This is the respective line:
<s:pattern name="Glossary 'firstterm' type constraint">
which should be corrected like this:
<s:pattern>
<s:title>Glossary 'firstterm' type constraint</s:title>
The tools side of Schematron seems to be interesting ...
Websites related to Schematron are also interesting: They seem to either show lots of 404 errors (schematron.com has a working front page but all sub pages 404), lead to ad farms (Rick Jeliffe's home page with the reference implementation, Probatron) or advertise proprietary software (Oxygen, XML Buddy, Topologi).
I am starting to think that investing in Schematron at this point might not be such a good idea.
[edit 1, sknorr: libxml does have Schematron 1.5 support but it is not mentioned in the man page.] [edit 2, sknorr: lxml has ISO Schematron support which I overlooked initially.]
libxml (i.e. xmllint, xsltproc & lxml) do not support Schematron
Actually, this is not quite true. There is the option --schematron
. However, as far as I can see, you can only use Schematron 1.5 with that. So in a way, you can say libxml "supports" Schematron---although I wouldn't say nicely.
I wouldn't consider this a valid alternative...
I think the best approach would be to write a wrapper in Python using lxml library. This library supports ISO Schematron.
A quick fix reveals some nice features:
from lxml import isoschematron
from lxml import etree
# Create a Schematron parser:
sch_doc = etree.parse("geekodoc5.sch")
schematron = isoschematron.Schematron(sch_doc)
# Parse our DocBook5 source:
doc = etree.parse("foo.xml")
schematron.validate(doc)
# => False
print(schematron.error_log)
# => Prints an extensive error log (XML) which can be parsed
I think, this can be easily created into a small Python "Schematron validation script". ;-)
[...] I am starting to think that investing in Schematron at this point might not be such a good idea.
Yes, I can understand that you get this impression. I've recently discovered this 404 page as well. Not sure why this isn't available anymore. Nevertheless, I don't think it is that bad. As I've shown in my earlier post, it can be used in lxml, with some minimal scripting efforts.
All in all, I don't think this is something I would abandon Schematron at this stage. Of course, if lxml reveals some technical problems. we will need to think again.
Apart from my last comment, we should add specific rules depending on GeekoDoc and our styleguide.
I would suggest to distinguish between "hard" and "soft" rules:
docbook.sch
(upstream DocBook).step
inside a procedure
.listitem
inside orderelist
or itemizedlist
.varlistentry
inside variablelist
.member
inside a simplelist
.step
s inside a procedure
.title
inside admonition elements (note
, tip
, warning
).xml:id
attributes.Probably I miss other rules.
toms wrote...
- Check for more than 1 listitem inside orderelist or itemizedlist.
- Check for more than 1 varlistentry inside variablelist.
Both of those rules are good ways to make our "documentation updates" sections fail validation... :/
Both of those rules are good ways to make our "documentation updates" sections fail validation... :/
Ahh, right! Ok, we could move these from hard to soft rules. I just try to collect some examples...
As I said somewhere above: within tables, counting the actual columns v/ columns set up via colspec would be great. And there are more issues concerning tables that should make validation fail but don't: such as bad column name references etc.
We could also check for spaces in ID attributes, such as in e.g. xml:id=" foo.bar"
which will also go through current validation unhindered but fail when building HTML or PDF.
These would also give us added value as opposed to reimplementing something that is already covered by SDSC.
counting the column numbers of tables v/ within colspec would be great. And there are more issues concerning tables that should make validation fail but don't: such as the column name references etc.
Well, we could check if the value of @cols
and the number of colspec
elements are the same. That is easy. Also checking column name references shouldn't be too hard. I'll add that into our list.
However, tables can get complicated when spanning a cell or row are involved.
We could also check for spaces in ID attributes
Great idea!
These would also give us added value as opposed to reimplementing something that is already covered by SDSC.
But don't we want to move these parts into the Schematron schema?
Moved the list of checks into original description.
From https://github.com/openSUSE/geekodoc/issues/6#issuecomment-263288127, I've tried to create a script which can validate our (yet to be definied) Schematron schema. In the long run, the script can be integrated into daps (if not, it was a good exercise :grinning: ).
@sknorr: For a first draft, see https://github.com/openSUSE/schvalidator
In openSUSE/suse-doc-style-checker#117, I raised the question if a Schematron schema could be useful for SDSC. The same question can be asked for GeekoDoc as well.
A Schematron schema can be used in two ways:
.sch
). They are independant of the existing GeekoDoc RNG.The validation procedure would be different:
Rick Jelliffe, the inventor of Schematron, describe the language as "a feather duster to reach the parts other schema languages cannot reach". ;-)
Benefits
Schematron Versions
Currently, there are two versions of Schematron:
ISO-Schematron (published Mai 2006) the de-facto standard of Schematron. The new namespace
http://purl.oclc.org/dsdl/schematron
.Schematron 1.5 (published 2001) The old reference implementation in pure XSLT. The namespace is
http://xml.ascc.net/schematron/
.Tools
Schematron validation are supported by:
xmllint
and option--schematron
.lxml
, see http://lxml.de/validation.html#id2See also
Personal
From my perspective, I prefer the separate Schematron schema (assuming all is possible, feasible, or useful). It seems, this doesn't introduce too many changes and gives greater flexibility.
I see it more as a "conformance and consistency" check rather than a hard validation. Of course, the rules shouldn't bother our writers too much.
Maybe we should also (re?)think about our definition of "validity/validation".
--
Update: List of Checks
Hard Rules
xml:id
Soft Rules
xml:id
attributes.@sknorr I've separated the discussion in SDSC from the GeekoDoc aspect. Feel free to comment. :)