Open joeha480 opened 8 years ago
There is some more debug info on STDERR (tested with Mischa's file):
Unexpected text contents in "note" element.
Error at xsl:message on line 331 of dtbook2flow_sv_SE_braille.xsl:
XTMM9000: Processing terminated by xsl:message at line 331 in dtbook2flow_sv_SE_braille.xsl
at xsl:apply-templates (bundle://33.0:1/org/daisy/dotify/impl/input/xml/resource-files/sv-SE/xslt-files/dtbook2flow_sv_SE_braille.xsl#299)
processing /dtbook/book[1]/rearmatter[1]/level1[1]/note[1]
at xsl:call-template name="insertNoteCollection" (bundle://33.0:1/org/daisy/dotify/impl/input/xml/resource-files/sv-SE/xslt-files/dtbook2flow_sv_SE_braille.xsl#33)
@joeha480 By just looking at the XSLT it seems reasonable to throw an error for this input:
<note id="fn1">
<p>Dies ist der Kommentar zu Fussnote 11</p>
</note>
In the template at line 328, the variable $note
has 3 child nodes: a p
element and two text nodes at both sides of the p
. The first text node triggers the terminating message.
Sure, but it works standalone, so if there is an error here now, there could be errors in other places with similar XPaths (which might not report it as errors). We need to know what is causing this.
It looks like you have configured Saxon (or Saxon configured itself) to automatically strip space from the source document (https://www.w3.org/TR/xslt20/#strip). Could it have to do with the DOCTYPE declaration? Maybe your Saxon analyses the DTD and decides that the white space is insignificant?
Yes I think that could be it. I tried to google this, but I didn't find it. But you did!
In practice this means that elements that are defined in a DTD or a Schema to contain element-only content will have whitespace text nodes stripped, regardless of the xsl:strip-space and xsl:preserve-space declarations in the stylesheet.
Do you know what is causing the difference in the saxon configuration? Is dtd resolution turned off somewhere?
You can initialize Saxon with or without schema awareness. And indeed Pipeline does it without. Dotify probably does it with (explicitly in your code or not, I don't know how you do it). I don't know if we did it like we did for a specific reason. I can imagine a possible reason is to make the system behave the same with or without internet access. @capitancambio @rdeltour: do you know?
By the way Pipeline initializes Saxon through Calabash (XProcConfiguration.java) but that is not relevant to the issue.
I'm not sure about the motivation of starting saxon without schema awareness, probably not needed?
I don't remember either. Isn't it Calabash's default?
Maybe yes. The question now is whether there is any drawback to setting schemaAware to true? I don't think any of our XSLTs depend on schemaAware to be false? And we could possibly add the DTBook DTD to some catalog.xml to get a deterministic system even when there is no internet access.
Or should we make it a configuration option?
The dtbook dtd should be there, offered by at least one of our scripts. In any case, any static asset like dtds should be offered by common modules or the script itself as a static resource.
cheers,
Javi
@rdeltour @capitancambio What should we do with this? Making it a configuration option doesn't seem right if the behavior of the modules depends on it. (TBH I think it's a bit weird that Saxon has this configuration option in the first place.) Shall we just try and enable it and see what happens?
@bertfrees yes you can try to enable it; but then we should be careful about testing all our scripts.
@bertfrees @joeha480 I had a second look at this issue: it appears the root of the issue comes from the XSLT which doesn't cover the case were there are whitespace-only nodes in the note
element. That's a bug IMO.
That schema-aware processing solves it is one thing, almost a "side effect", but the XSLT should also work in non-schema-aware processing.
In that case, it would be reasonably straightforward to fix the XSLT with some xslt:strip-space
declaration (preferably) or by explicitly testing the content of text nodes in the xsl:when/@test
expression.
And also, the reason why we don't configure Saxon to be schema-aware is probably that Saxon HE doesn't provide the feature :smile:. So I'm afraid that changing the configuration won't solve anything in the default Pipeline distribution.
BTW, if you're testing your XSLT standalone in oXygen or somewhere else, it's usually good to run it with Saxon HE too, at least from time to time.
Sorry if I didn't pay more attention previously!
I don't think it's a bug in the xslt (but maybe I should explicitly set schema aware in the stylesheet using xsl:import-schema if that is even what it is for). It's very possible however that I haven't instantiated Saxon correctly in OSGi context. That whole area is a bit unclear to me.
If Saxon isn't providing the feature, then where could I be getting the feature from? This is the list of third party dependencies: jing-20120724.0.0.jar Saxon-HE-9.5.1-5.jar saxon-he-9.5.1.5.jar wstx-lgpl-3.2.7.jar (I have tried without this) xercesImpl-2.11.0.jar (and without this) xml-apis-1.4.01.jar
@bertfrees: I guess we could disable schema awareness in Dotify and run regression tests on it (because there might be other places where I use this feature). Anyway, we need to solve this before finishing the project. I can give you access to my regression tests, so that you can run them, if you want.
@rdeltour: Right, so that's the reason Saxon has the option :)
By the way, yesterday I also realized that changing the configuration in calabash-adapter wouldn't necessarily have had an impact on Joel's step because the step invokes the XSLT through Java (Dotify) so it depends on how Joel invokes the XSLT in Dotify.
Regarding fixing the XSLT: yes, that was my first reaction too. But then again, it also seems reasonable to make certain assumptions about the input, or to rely on a processor feature. I wouldn't say the XSLT is "broken".
@joeha480: Yes, xsl:import-schema
appears to be the right command, however it's only Saxon-EE: http://www.saxonica.com/documentation9.5/xsl-elements/import-schema.html.
I wonder how you even made it work in standalone Dotify? :confused: Are you sure you are using Saxon-HE? EDIT: OK you asked the same question. Where are you getting your XSLT transformer object from in Java? Can you give me a pointer?
Another idea I got was to add an explicit pre-processing step that strips the unwanted spaces based on the schema. Maybe it is straightforward to do (i.e. without the Saxon feature).
I don't think it's a bug in the xslt (@joeha480) I wouldn't say the XSLT is "broken". (@bertfrees)
Well, I used "bug" to make my point (rather than to point the finger :smile: ), but yes I actually think that the XSLT should work with non-schema-aware processors. Unless it requires schema-awareness by contract, but I think there's nothing justifying it in this case.
What's wrong with using xsl:strip-space
or testing ws-only text nodes?
As for why it works in standalone Dotify, my wild guess is that the XML source tree is built with another parser, not directly from Saxon.
I think the problem with testing ws-only text nodes is that the same issue could exist in other XSLTs, so we'll have to go through all of them.
Using strip-space could be a solution. It's also not completely automatic, but it shouldn't be hard to do it based on the schema. (Wouldn't it be possible to automatically generate a xsl:strip-space
element from a schema document? Or am I making it too complicated?)
As for why it works in standalone Dotify, my wild guess is that the XML source tree is built with another parser, not directly from Saxon.
Ah, that would explain it, yes. And in that case it would indeed be a "side effect".
the same issue could exist in other XSLTs, so we'll have to go through all of them.
Yes, eventually I think that all other XSLT s/b tests without schema awareness.
Using strip-space could be a solution. It's also not completely automatic, but it shouldn't be hard to do it based on the schema. (Wouldn't it be possible to automatically generate a xsl:strip-space element from a schema document? Or am I making it too complicated?)
It depends: how many XSLTs or different schemas are we talking about?
Depending on the number of XSLT, a quick manual review of the pain points (like failing on text nodes, regardless of possibly empty content) might be preferrable?
To do: look at "STRIP_WHITESPACE" setting in http://www.saxonica.com/html/documentation/configuration/config-features.html.
Running with this file: http://www.daisy.org/samples/dtbook-xml/great-painters-dtbook-2005-3.zip
in DP2 resulted in an error.
Processing terminated by xsl:message at line 331 in dtbook2flow_sv_SE_braille.xsl
However, when running standalone Dotify, the file is processed just fine.
Could this be an environment issue related to the xml processor?