w3c / wpub-ann

Web Annotation Extensions for Web Publications
https://w3c.github.io/wpub-ann/
Other
6 stars 10 forks source link

Use cases for Position (not satisfied by Selector) - are there any? #9

Open tcole3 opened 7 years ago

tcole3 commented 7 years ago

From Section 3.1.4 (Character Offset) of EPub 3.1 CFI: "For XML character data, the offset is zero-based and always refers to a position between characters, so 0 means before the first character and a number equal to the total UTF-16 length means after the last character. A character offset value greater than the UTF-16 length of the available text must not be specified." And from Section 3.1.9 (Side-Bias) of EPub 3.1 CFI: "In some situations, it is important to preserve which side of a location a reference points to. For example, when resolving a location in a dynamically paginated environment, it would make a difference if a location is attached to the content before or after it (e.g., to determine whether to display the verso or recto side at a page break)."

Assuming these are real feature requirements, I don't think we have anything precisely equivalent in Web Anno data model. Putting aside for a moment whether something called a fragment identifier can be used to specify a location, how might we be able to address a need for these functionalities?

Regarding the first bit, I do note that in Web Anno we do not specify a meaning for a TextPositionSelector or DataPositionSelector having the same value for both start and end. We do talk about "Position 0 would be immediately before the first character[/byte]". So in this doc could we specify an interpretation that if the document was "abcdefghijklmnopqrstuvwxyz", the start was 4, and the end was 4, we are specifying the location immediately before the character 'e'? For completeness should we specify what to do if end (or start) is greater than the length of the normalized text?

Regarding the second bit, side-bias, I have no idea other than to suggest that this is not something a locator or fragment identifier should have to worry about - it's something the consumer of the locator should be responsible for.

BigBlueHat commented 7 years ago

The Character Offset one is likely media type specific and determined by many more things than just "raw" Web Annotation Data Model. Given that content encoding and even selector options are dictated by the media (MIME) type chosen, I'd think that one would be the realm of implementors and (if needed) further refinement of the core TextPositionSelector and/or DataPositionSelector.

The "Side-Bias" one seems the be dependent on how (or if) Web Publications include pagination information beyond the "reading order" connected resources and/or the CSS used to render them. Think we deal with that monster if/when it shows up. 😄

iherman commented 7 years ago

For the side-bias: I actually believe that implementors should not use that selector of CFI (we have to find out whether the do...). Due to the dynamic nature of pagination, making a, say, bookmarks dependent on that "monster" (@bigbluehat :-) may be a very bad idea...

iherman commented 7 years ago

@tcole3

For completeness should we specify what to do if end (or start) is greater than the length of the normalized text?

Yes, probably. We may have to go through the document with a magnifying glass. That being said, it seems that the WA implementations did not bring forward any ambiguities, and I am a bit uneasy about extending the WA specification on existing Selectors…

tcole3 commented 7 years ago

Following up on discussions during Working Group call of 2017-10-16, I have submitted a Pull Request #15 to amend definitions of TextPositionSelector and DataPositionSelector such that a position in the text/bytestream can be referenced.

I have also added a property (bias) to the TextPositionSelector that may be used to associate a position locator with the character before or after the position - however, I still do not understand the use case in the context of locators. The text from EPUB CFI alludes to a dynamic pagination use case, but I agree with @iherman that side bias should not be an attribute of a locator. I can appreciate that an application for paging through a Web Publication encountering a hard page break might need to consider whether to display the text preceding the break or the text following the break, but this seems to me a matter for the paging application. It should not require having two different locators for the position of the page break, one pointing forward from the page break and one pointing backwards from the page break.

So to complete the discussion of TextPositionSelector bias (if we decide to keep) we need a concrete example, preferably based on a real CFI use case involving side-bias. Can someone provide the details of a use case where CFI side-bias is being used?

iherman commented 7 years ago

Yep, we really need use cases! @GarthConboy?

iherman commented 7 years ago

(Commented this elsewhere, but as part of the closed PR #15, moving the arguments here.)

PR #17 provides a consistent view, including bias, but for a relatively brittle way of selecting, namely based on the numbering of characters in a text. I wonder whether we look at bias differently: consider it as some sort of "hint" for implementations (e.g., annotators) as for their control of the user interface. I.e., for a text quote selector it would mean that any annotation popup should be position before (or, respectively, after) the selected text, and the same would be true for all other selectors when applicable.

Ie, we could introduce bias as a common property for Selectors, with something:

bias: A hint for implementations on their user feedback regarding the positioning of, e.g., comment notes. The value MUST be before or after. A Selector MAY have at most one bias property.

I realize this has a weaker semantics than what EPUBCFI proposes, but may be enough for our use cases.

If done this way, the introduction of bias becomes orthogonal to the question whether we need Position Specifiers, as proposed in #17.

tcole3 commented 7 years ago

My concern with adding bias to a Selector is that I'm not sure it expresses what needs to be expressed anyway and it might muddle the semantics of selection. Semantically I'm not entirely sure what a bias on a selected range or text of bytes really means. Might take too much hand-waving to explain.

For me the brittleness and the use cases of interest are handled by refinement. Refinement (Section 6.3 of PR #17 preview) avoids the brittleness concern (to a degree) by unambiguously and explicitly breaking the process into separate steps, Step 1, select this fragment (text range or byte range) of the source. Step 2, within this fragment count over 10 characters (or bytes) and the position between that character (or byte) and the next character (or byte) is the position of interest. Changes in character or byte counts before or after the fragment do not invalidate the position definition. So far you don't need bias, and if you do it will be applied to a position (which makes more sense to me if I understand the use cases) rather than to a selection which seems a little less clear.

Still if there is a use case for a selector bias, then I'd be okay generalizing.

iherman commented 7 years ago

@tcole3

My concern with adding bias to a Selector is that I'm not sure it expresses what needs to be expressed anyway and it might muddle the semantics of selection. Semantically I'm not entirely sure what a bias on a selected range or text of bytes really means. Might take too much hand-waving to explain.

Yeah... I must admit I am trying to guess the use cases and requirements here, and I may have gotten it wrong.

I agree with you on the brittleness issue and the importance of combining this with refinement.

I must admit that, at this moment, I am not sure what we should do, in the absence of other inputs. I would propose we make the PR final (ie, adding the missing bits) and let it there until we get clear feedbacks. If there is no real need for this, we may simply want to drop it...

tcole3 commented 6 years ago

Discussions at 2017 TPAC WG f2f did not identify a compelling use case for positions (rather than selectors). We need a use case to retain this feature. If no use case is available in a timely fashion (before the end of November 2017), the feature will be removed from FPWD. Can always be added back if a use case subsequently emerges.

The side-bias question is now a separate issue, #29, and this issue will be renamed.