Closed GoogleCodeExporter closed 9 years ago
[deleted comment]
From issue #351, here is some informative text that may be desirable to add to
the Step Reference section:
<<<<<
Note that a path refers to a location point, yet may end in a step reference.
Thus the last step in a path without a "terminating step" is not navigable: it
may represent an empty collection of text content nodes or refer to an element
position before or after a text content collection, and regardless represents
the location between nodes and not a node (or collection of nodes) itself. This
is consistent with other similar representations (e.g. boundary points in the
DOM Selection and Range definitions) and allowing local paths to end in step
references that represent terminating points rather than navigable nodes
facilitates interoperability with those implementations.
Original comment by nat...@gmail.com
on 8 May 2013 at 6:28
Original comment by daniel.weck
on 8 May 2013 at 6:32
Original comment by daniel.weck
on 8 May 2013 at 7:25
Email discussion:
https://groups.google.com/forum/#!topic/epub-working-group/ajYExeF7_rs
Original comment by daniel.weck
on 23 May 2013 at 3:16
Hi natevw, perhaps the EBNF terms "termstep" and "terminus" are misleading, as
they indeed seem to imply that "step" is not designed to be the last item in a
CFI expression, yet it can effectively be (legally) the "leaf" of a CFI
reference. The question is whether or not this reflects a practical reality,
for example: would a reading system link directly to a <br/> line break
element, or to the "empty child" inside the element's content => either the
empty chunk of character data, or the virtual first/last elements at index 0 or
n+2 (which is not recommended as per the SHOULD NOT conformance requirement).
Personally I think that the specification is fine as it stands now, but perhaps
I am missing your point?
Daniel
Latest editor's draft:
https://epub-revision.googlecode.com/svn/trunk/build/linking/cfi/epub-cfi.html#s
ec-epubcfi-syntax
Latest published specification:
http://www.idpf.org/epub/linking/cfi/#sec-epubcfi-syntax
Original comment by daniel.weck
on 23 May 2013 at 4:01
It's a difference in meaning.
Consider the path "2/4/1:1" applied to
`<parent><child>first</child><child>second</child></parent>`. The "4" is a
navigational step referring to the second child.
Now consider the path "2/4" applied to same. The "4" does NOT refer to the
second child, but rather the location between it and the (empty) text group
after the first child. To quote the spec: "Single path notation always denotes
a location point".
The grammar makes no distinction between these two types of references. And so
this sentence cannot be corrected: "A step with a slash (/) followed by an
integer refers to a child node or nodes"
That sentence is wrong because not every "step" refers to a child node or
nodes. Some steps (the ones I would call "navigational") do. Others (the ones I
would call "terminal") do not: if a path (which ALWAYS denotes a location
point) ends in a step then that step represents a point just as all the
existing terminus types do.
There's no grammatical difference between the two, both are `"/" , integer , [
"[" , assertion , "]" ] ;` but there is a semantic difference.
Here's sort of the gist of what I'm proposing…
fragment = "epubcfi(" , nav_path , (range | term_path ) , ")" ;
nav_path = { step }-
term_path = { step } [termstep]
range = "," , term_path , "," , term_path ;
# drop local_path
…however, I have left out any construct necessary to represent path
indirection (the existing redirected_path construct) so this is NOT a direct
proposal in itself, just illustrative of the semantic difference I see. If this
were done, then the 3.1.1 portion of the spec could say something like "A
navigational step refers to a child node or nodes … a terminal step is
numbered similarly but refers to [etc.]"
Original comment by nat...@gmail.com
on 23 May 2013 at 4:38
Regarding:
"A step with a slash (/) followed by an integer refers to a child node or nodes"
That sentence is wrong because not every "step" refers to a child node or nodes.
Could you please use the updated terminology, as this is now obsolete. See:
https://groups.google.com/d/msg/epub-working-group/HC_hS7ae6mo/dm54uIui_QAJ
Meanwhile, I am reading your comment further :)
/Dan
Original comment by daniel.weck
on 23 May 2013 at 4:44
The updated specification prose says: "A [step] with a slash (/) followed by a
positive integer refers to either a child element or a chunk of character data,
as per the rules defined herein..."
The term "refers" is correctly used here, as it covers both "navigational" and
"terminating" [steps] without ambiguity (see below).
The original prose (untouched) says: "[Steps] can either be navigational or
terminating. Navigational [steps] may be repeated as necessary (e.g., ...).
There may be only one terminating [step], which, if present, must be the last
[step] in the sequence."
The problem is that what the prose says in "plain english" is not accurately
matched by the EBNF grammar: a [step] (i.e. not a [termstep]) can in fact
effectively "terminate" a CFI expression (i.e. be the last item), or it can be
used to traverse / walk further down into the XML tree (in which case the term
used to describe this is "navigational"). Furthermore, a [step] can refer to
either XML element or text, the former being a natural candidate to "terminate"
a CFI expression (by defining a location corresponding to the opening tag of an
XML element, with the addition of an optional side bias useful in breaking
context), whereas the latter is really supposed to be followed by a [termstep]
of type "character offset" (but not necessarily, as per the syntax rules).
So, I suggest that we fix the EBNF production rule "termstep" as follows (to
include [step]):
termstep = step | ( terminus , [ "[" , assertion , "]" ] );
This way, it is clear that both an element or a chunk of character data are
considered valid terminating "locations" in an XML document. However, just like
we did with "virtual" elements (+ the issue of empty first/last character
data), I suggest that we use a SHOULD NOT conformance requirement for the
production of CFI expressions with a terminating step that refers to character
data without an explicit character offset. In other words, reading systems MUST
be capable of consuming (parse + interpret / render) the implicit location just
before the first character of a terminating character data step, but they
SHOULD generate explicit character offsets when creating such CFI location.
natevw, I hope this addresses your concerns. Otherwise, let me know :)
Original comment by daniel.weck
on 23 May 2013 at 5:55
This sounds reasonable to me � a much simpler change but one which still seems
to capture the essence of the _two_ usages of a step reference.
Original comment by nat...@gmail.com
on 23 May 2013 at 6:00
Formal proposed solution and 72h objection window:
https://groups.google.com/forum/#!topic/epub-working-group/iusOUBjlFLY
------------
In "2.2 Syntax":
https://epub-revision.googlecode.com/svn/trunk/build/linking/cfi/epub-cfi.html#s
ec-epubcfi-syntax
=> Rewrite the [termstep] EBNF production rule as:
termstep = step | ( terminus , [ "[" , assertion , "]" ] );
Note: this adds [step] (i.e. reference to element or to interspersed chunk of
XML character data) as a valid "terminating" item within a CFI expression.
In "3.1.4 Terminating Step – Character Offset (:)":
https://epub-revision.googlecode.com/svn/trunk/build/linking/cfi/epub-cfi.html#s
ec-path-terminating-char
=> Just before the last phrase "No other steps may follow a character offset
terminating step", add this line:
"
CFI expressions should not be produced with a terminating /N step (i.e. no
explicit character offset) where N is odd to refer to a chunk of XML character
data interspersed amongst XML elements. However, CFI processors (e.g. Reading
Systems) must be capable of consuming (i.e. parse + interpret / render) such
CFI expressions, by assuming the implicit /N:0 character offset.
"
------------
Original comment by daniel.weck
on 23 May 2013 at 6:50
One issue is that rewriting the "termstep" rule as proposed breaks the LL(1)
nature of the grammar, which conflicts with issue 343.
In order to stay close to the english prose, I propose to instead rename the
"termstep" production to "termstep_offset" (the production is really about
offsets, the terminating step itself *always* starts with an integer).
About @natevw's concern of the dual nature of steps (i.e. navigational vs.
terminating), the EBNF is coherent since it groups both under the generic name
"step".
Additionally, I don't think that the prose "refers to a child element" in
section 3.1.1 is wrong (as suggested in comment #7). A navigating step refers
to an element as a way to navigate to the XML tree, a terminating element step
refers to an element as a way to denote a location.
The prose (as revised in issue 301) might be extended with a paragraph that
clarifies this distinction, but is otherwise correct IMHO.
Original comment by rdeltour@gmail.com
on 23 May 2013 at 9:12
Romain, I think that [step] in the EBNF is "navigational" and [termstep] is
"terminal", at least I think it was the original intent, thus the corresponding
prose in plain english that describes these CFI components. The problem is that
[step] can effetively also be "terminal", in addition to "navigational". I am
afraid I fail to see how your proposal helps solve this issue. Also, the term
"offset" in "termstep_offset" doesn't really fit with 2D coordinates / spatial
region.
Original comment by daniel.weck
on 23 May 2013 at 9:33
expanding the step production rule into termstep does not break LL(1), and this
actually better reflects the dual nature of /N
termstep = ( ( "/" , integer ) | terminus ) , [ "[" , assertion , "]" ] ;
Romain, see comment #9 for a breakdown of relevant prose bits.
Original comment by daniel.weck
on 23 May 2013 at 9:48
I believe the change proposed in comment #14 still breaks LL(1). Attached is
the corresponding EBNF (in W3C syntax) that can be fed to REx [1].
I also think that the proposed change does not improve the mapping to the
english prose. Consider this CFI:
epubcfi(/6/4[chap01ref]!/4[body01]/10[para05]/2/1:3[;s=b])
My understanding is that the terminating step is "1:3[;s=b]", i.e. the last
step in the path sequence + the character offset.
If this is correct, this full step is not produced by the proposed 'termstep'
production, only the offset is.
About comment #13 and my suggestion to rename the production "termstep_offset",
this was based on the very headings of the CFI spec 3.1.4 to 3.1.7, which all
include the word "offset". 2D coordinates / spatial regions are called "spatial
offset" in the spec.
[1] http://www.bottlecaps.de/rex/
Original comment by rdeltour@gmail.com
on 24 May 2013 at 6:54
Attachments:
about "offset": ah, i stand corrected, although i don't think that a spatial
region is semantically equivalent to an "offset", but that's a different issue
to be filed separately.
regarding "terminating" steps: Romain, can you have a look at comment #6, i
think that the <br/> example is pretty symptomatic of the fact that we need to
cater for the dual nature of /N (both "navigational" and "terminal").
Original comment by daniel.weck
on 24 May 2013 at 7:26
Romain and I had a chat in order to discuss how to have a correct LL(1) EBNF,
whilst using "plain english" non-ambiguous, consistent prose that reflects the
terms used in the formal syntax. We concluded that minimal changes are needed
in the grammar, and prose adjustments are required in quite a few places:
- The production name 'termstep' is a misnomer, because the 'step' production
rule itself can "terminate" a CFI expression. Furthermore, the concept of
"terminal symbol" has a different meaning in EBNF. As Romain suggested, let us
use the term 'offset' instead (consistent with headings 3.1.4-7). Note: I
personally think that the term 'offset' is not really suitable for "spatial 2D
region", but I am not strongly opposed to it. Rename the 3.1.4-7 headings by
removing "Terminating Step - ".
- Instead of trying to rename the 'terminus' production rule to something less
likely to be misconstrued (remember, a /N 'step' can also "terminate" a CFI
expression), I suggest we merge it into the renamed version of 'termstep' (now
'offset'). This is now consistent with the structure of the 'step' production
rule (optional 'assertion' at the end).
To summarise:
offset = ( ( ":" , integer ) | ( "@" , number , ":" , number ) | ( "~" , number
, [ "@" , number , ":" , number ] ) ) , [ "[" , assertion , "]" ] ;
See:
https://epub-revision.googlecode.com/svn/trunk/build/linking/cfi/epub-cfi.html#s
ec-epubcfi-syntax
- The term "navigational" is only used in one sentence (see "2.2 Syntax"), with
no direct equivalence in the EBNF. Generally-speaking, I think that the term
"traversal" (XML tree) is more appropriate, because "navigation" has another
meaning in EPUB. In fact, the CFI introduction says "The functionality ... is
varied: from reading location maintenance to annotation attachment to
navigation". So, this whole sentence needs to be reworked:
"Steps can either be navigational or terminating. Navigational steps may be
repeated as necessary (e.g., to count elements, to process children or to
follow references). There may be only one terminating step, which, if present,
must be the last step in the sequence."
I suggest:
"Steps are denoted by the '/' forward slash character, and are used to traverse
XML content. The last step in a CFI path represents a location within a
document, either structural (XML element), textual (character data), or
aural-visual (image, audio, or video media). Such terminating step may be
complemented by an optional "offset", which denotes a particular character
position, temporal or spatial fragment."
- In "3.1.4 Character Offset (:)":
https://epub-revision.googlecode.com/svn/trunk/build/linking/cfi/epub-cfi.html#s
ec-path-terminating-char
=> replace "A terminating step with a leading colon" with "A path terminating
with a leading colon".
=> replace "A character offset terminating step may be present only following a
/N step." with "A character offset may follow a /N step."
=> remove "No other steps may follow a character offset terminating step."
(already expressed in "A path terminating with")
- In "3.1.5 Temporal Offset (~)":
https://epub-revision.googlecode.com/svn/trunk/build/linking/cfi/epub-cfi.html#s
ec-path-terminating-temporal
=> replace "A terminating step with a leading tilde" with "A path terminating
with a leading tilde".
=> remove "No other steps can follow a temporal offset terminating step."
(already expressed with the sentence above)
- In "3.1.6 Spatial Offset (@)"
https://epub-revision.googlecode.com/svn/trunk/build/linking/cfi/epub-cfi.html#s
ec-path-terminating-spatial
=> replace "A terminating step with a leading at sign" with "A path terminating
with a leading 'at' sign".
=> remove "No other steps can follow a spatial offset terminating step."
(already expressed with the sentence above)
- In "3.1.7 Temporal-Spatial Offset (~ + @)"
https://epub-revision.googlecode.com/svn/trunk/build/linking/cfi/epub-cfi.html#s
ec-path-terminating-tempspatial
=> remove "No other steps can follow a temporal-spatial position terminating
step." (redundant, already previously expressed)
- In "3.1.8 Text Location Assertion ([)":
https://epub-revision.googlecode.com/svn/trunk/build/linking/cfi/epub-cfi.html#s
ec-path-text-location
=> replace "character offset terminating step." with "character offset."
- In "3.1.9 Side Bias ([ + ;s=)"
https://epub-revision.googlecode.com/svn/trunk/build/linking/cfi/epub-cfi.html#s
ec-path-side-bias
=> replace "Side is not defined for locations with spatial terminus." with
"Side is not defined for locations with spatial offset."
That's it.
Original comment by daniel.weck
on 24 May 2013 at 9:34
Please see the complete formal proposal in the updated 72h message:
https://groups.google.com/d/msg/epub-working-group/iusOUBjlFLY/TAJKuvdhvMcJ
Original comment by daniel.weck
on 24 May 2013 at 9:46
72h clock ended. Matt, edits please! :)
Original comment by daniel.weck
on 29 May 2013 at 4:48
Specification has been updated:
https://code.google.com/p/epub-revision/source/detail?r=4652
Original comment by mgarrish
on 30 May 2013 at 12:21
Original comment by daniel.weck
on 30 May 2013 at 12:25
Original issue reported on code.google.com by
nat...@gmail.com
on 8 May 2013 at 6:20