Closed mattgarrish closed 8 years ago
It seems to suggest that the quotes are delimiters in the epubcfi that should not be url encoded, but I don't see anything in the specification to suggest that's the case. They appear to be part of the value to match.
But then the specification does say:
... double quote characters [and] angle brackets conflict with significant delimiters in the markup syntax, and must therefore be escaped using the &xxx; special sequence (character reference).
Thoughts here @danielweck?
Oh, the quote character you saw was &22; (github happened to decodes it. Please check the original line in repository).
The point was, &xxx; is xml endode and there should be no xml encodes applied to convert IRI to URI.
Yes, github translated the entity back to the character when I pasted the comment in.
If you look at the example it is explicitly calling out that you have to translate the quotes to entities for inclusion in an attribute in an xhtml page, though. That would seem to be valid if the quotes were delimiters in the cfi, but I can't find that it's the case.
The example I had quoted is (IMHO) not for epubcfi included in xhtml context but the example of URI encoded epubcfi.
The example is shown just under these explanation;
Should the IRI need to be converted to URI, the non-ASCII Cyrillic "EF" character ('Ф') would get percent-escaped with 2 bytes ('0xd0 0xa4', in hexadecimal). This would result in the following URI:
I know the the URI representation of epubcfi in xhtml must be encoded using xml encode, but isn't this example for a plain URI representation of epubcfi (via IRI representation) out of xml context, according to this explanation?
Ah, good question. I was reading it as a continuation of the previous example.
I'll let Dan respond, as I wasn't involved in writing this spec.
Thanks.
People tend to confuse about the usage of xml encode and uri encode and I'm worrying about spreading of partly xml encoded epubcfi as "uri encoded epubcfi".
Thanks again and sorry to bother you.
Thanks @vtns for your feedback! It is difficult to read the specification prose in the code diff: https://github.com/IDPF/epub-revision/commit/8f4fd07b99b369f146fd8fc425494588a2aa507f#commitcomment-14654718 ...so let's use the generated web page instead: http://www.idpf.org/epub/linking/cfi/epub-cfi.html#sec-epubcfi-escaping
There is at least one straight-forward typo:
"When the IRI appears within an XML attribute"
#epubcfi(/6/4!/4/10/2/1:3[Ф-"spa%20ce-99%25"-aa^[bb^]^^])
=> should be:
#epubcfi(/6/4!/4/10/2/1:3[Ф-"spa%20ce"-99%25-aa^[bb^]^^])
(misplaced "
double-quote character must be BEFORE "-99%")
Then, there's the following paragraph (as per @vtns bug report):
"Should the IRI need to be converted to URI"
#epubcfi(/6/4!/4/10/2/1:3[%d0%a4-"spa%20ce"-99%25-aa^[bb^]^^])
=> should be:
#epubcfi(/6/4!/4/10/2/1:3[%d0%a4-%22spa%20ce%22-99%25-aa^[bb^]^^])
(replacement of "
with %22
, as this is a URI expressed outside of the context of an XML attribute, demonstrating only the handling of non-ASCII unicode character)
The next example down is "URI encoding / decoding APIs usually 'aggressively' percent-encode characters", which correctly illustrates the complete URI escape sequences (exanding on the above example).
Daniel
Your fixes are clearly correct. There is another odd thing I was concerned (from developer view). mismatching may and should.
Section | |
---|---|
3.1.2 XML ID Assertion | a Reading System may determine that the location referenced by the CFI is not the original intended location, and may use the identifier to compute the set of steps that reach the desired destination in the content (see Intended Target Location Correction). |
3.1.8 Text Location Assertion ([) | A Reading System may determine that the location referenced by the CFI is not the original intended location (due to non-matching text), and may use the preceding/trailing text to compute the set of steps that reach the desired destination in the content (see Intended Target Location Correction). |
3.5 Intended Target Location Correction | When a Reading System is processing a CFI, it should check the correctness of any encountered assertions. For example, given the path /6/4[chap01ref]!…, the Reading System should verify that the element has the ID matching chap01ref when processing element 4 (for this example, an itemref in the spine). If not, the Reading System should locate the ID chap01ref within the document and correct the CFI (e.g., if a new itemref was inserted before the chap01ref itemref, the desired element number would now be 6 and the corrected CFI would be /6/6[chap01ref]!…). Likewise, text location assertions should be used to check referenced target locations, and used to derive a corrected CFI that targets the desired text location. |
vtns
@vtns well spotted.
Could you please file a new issue?
I think may
is more appropriate, because CFI references can be used in a variety of processing contexts / application domains. For example, robust legal citations / annotations frameworks would definitely need to validate assertions, but implementations of basic navigation / bookmarking (typically, in reading systems that both produce and consume the CFI expressions) are probably not going to bother validating + redirecting inaccurate references to the correct spot. @mattgarrish thoughts?
I've moved the issue about repairing to #631. Please continue any discussions about it there.
Proposed Solution:
Incorporate fixes in comment https://github.com/IDPF/epub-revision/issues/629#issuecomment-160653136
@mattgarrish would you mind correcting the editor's draft with the two proposed changes above? Thanks! :)
I've updated the prose as requested in the commit noted above. Please close this issue if no other changes are needed.
As of line 692 of src/spec/epub-cfi.xml, I think the direct conversion from IRI to URI would be:
epubcfi(/6/4!/4/10/2/1:3[%d0%a4-%22spa%20ce%22-99%25-aa^[bb^]^^])
instead of:
epubcfi(/6/4!/4/10/2/1:3[%d0%a4-"spa%20ce"-99%25-aa^[bb^]^^])
(the later looks like a IRI -> xhtml -> URI conversion.)
Reported by @vtns at https://github.com/IDPF/epub-revision/commit/8f4fd07b99b369f146fd8fc425494588a2aa507f#commitcomment-14654718