karl368 / firepath

Automatically exported from code.google.com/p/firepath
GNU General Public License v3.0
0 stars 0 forks source link

Feature request: Option to ignore whitespace nodes #18

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
…just like Saxon command-line option [-strip 
(all|none|ignorable)](http://www.saxonica.com/documentation/using-xsl/commandlin
e.xml).

> Specifies what whitespace is to be stripped from source documents (applies 
both to the principal source document and to any documents loaded for example 
using the document() function. The default is none: no whitespace stripping. 
Specifying all strips all whitespace text nodes from source documents before 
any further processing, regardless of anyxsl:strip-space declarations in the 
stylesheet, or any xml:spaceattributes in the source document.Specifying 
ignorable strips all ignorable whitespace text nodes from source documents 
before any further processing, regardless of any xsl:strip-space declarations 
in the stylesheet, or any xml:space attributes in the source document. 
Whitespace text nodes are ignorable if they appear in elements defined in 
the DTD or schema as having element-only content.

Whitespace nodes are a problem when processing plist XML files, since it makes 
node order numbering in XPath expressions rather unintuitive. As an example, 
using this XML as source:

    <plist version="1.0">
    <dict>
        <key>people</key>
        <dict>
            <key>teacher01</key>
            <string>John Smith</string>
            <key>student01</key>
            <string>Michael Jackson</string>
            <key>student02</key>
            <string>John Whitney</string>
        </dict>
    </dict>
    </plist>

One would think that:

    //key[text()="people"]/following::node()[1]/key

would match `<key>` nodes with values `teacher01`, `student01`, `student02`. 
Yet in XV or pretty much any other XPath interpreter it matches nothing, since 
that `following::node()[1]` matches a whitespace node that comes before the 
`<dict>` that contains the `key` nodes we were trying to target. In order to 
match those nodes one needs:

    //key[text()="people"]/following::node()[2]/key

which makes little sense to me. Similarly, to match the node `<string>` with 
value `John Smith`, the following wouldn't work:

    //key[text()="people"]/following::node()[1]/key[1]/following::node()[1]

because bumping both `following::node` order number +1 is required, yet 
apparently not so for that child `<key>`, for whatever reason:

    //key[text()="people"]/following::node()[2]/key[1]/following::node()[2]

In order to avoid all this nonsense in XSLT, one can use the 
element `<xsl:strip-space elements="dict"/>`, which overrides whatever the 
value of Saxon CL's `-strip` option is, and altogether acts as if those nasty 
whitespace nodes did not exist (and thus, all those expressions that I said did 
not work, work). Yet, if I use that option in my XSLT stylesheet, it then 
becomes impossible to test, outside of the XSL stylesheet itself, the XPath 
expressions that I use inside of it.

Original issue reported on code.google.com by chocolat...@gmail.com on 9 Mar 2011 at 11:18

GoogleCodeExporter commented 9 years ago
Oops, sorry for the "in XV or pretty much any other XPath". The feature request 
was actually copy/pasted from another project with that name, also lacking the 
feature.

Original comment by chocolat...@gmail.com on 9 Mar 2011 at 11:19

GoogleCodeExporter commented 9 years ago
Thanks for reporting this issue.

One thing to note first is that FirePath uses Firefox XPath processor under the 
hood. I looked if it was possible to configure it in order to achieve what you 
are requesting but I could not find anything like this.

However one thing I have noticed when looking at your XPath expressions is that 
you are using the "node()" Node Test with the "following" axe. As explained in 
the XPath specification (http://www.w3.org/TR/xpath/#node-tests):
"A node test node() is true for any node of any type whatsoever."
You could use the "*" Node Test instead, or even specify the exact name of the 
node you are looking for (dict or string in your example).

For example the following expressions might make more sense and they match the 
elements as you would expect:
//key[text()="people"]/following::*[1]/key
or
//key[text()="people"]/following::dict[1]/key

//key[text()="people"]/following::*[1]/key[1]/following::*[1]
or
//key[text()="people"]/following::dict[1]/key[1]/following::string[1]

I hope this help.

Original comment by pierre.t...@gmail.com on 12 Mar 2011 at 6:33