w3c / web-annotation

Web Annotation Working Group repository, see README for links to specs
https://w3c.github.io/web-annotation/
Other
141 stars 30 forks source link

IRI mapping for multiple selectors (non-refined)? #454

Open andymatuschak opened 1 month ago

andymatuschak commented 1 month ago

Summary: I'd like to be able to encode a URI fragment representing multiple selectors (not one refined by another), to be interpreted with the same semantics as a specific resource with multiple selectors. This doesn't seem to be possible with the current mapping.

Hypothes.is's strategy for specifying and resolving annotations uses a combination of TextQuoteSelector and TextPositionSelector. The combination allows some resilience to both document modifications and also to ambiguous matches. Here's an example:

"selector": [
        {
          "start": 1239,
          "end": 1283,
          "type": "TextPositionSelector"
        },
        {
          "type": "TextQuoteSelector",
          "prefix": "Na-na-na-na-na-na-na, na-na-na-na, hey, Jude",
          "exact": "Na-na-na-na-na-na-na, na-na-na-na, hey, Jude",
          "suffix": "Na-na-na-na-na-na-na, na-na-na-na, hey, Jude"
        }
      ]

To resolve the segment of interest, Hypothes.is:

  1. First tries the TextPositionSelector. If that range is identical to the exact key of the the TextQuoteSelector, we're done.
  2. Finds all places which match the TextQuoteSelector. It uses a fuzzy matcher, so it's tolerant of small modifications. If there's only one match, we use it.
  3. If there are multiple—as in the "Hey Jude" example above—we choose the segment which most closely matches the TextPositionSelector.

With only the TextPositionSelector, there's no resilience to document changes. With only the TextQuoteSelector, there's no way to handle ambiguities. @dwhly reports here that Hypothes.is's data shows these problems do turn up in production.

Unfortunately, as far as I can tell, the IRI fragment mapping provided here only allows one selector to be encoded. This problem was briefly discussed in #93, but the group appears to have concluded that refinedBy handles these cases. I don't think it handles the case I'm describing here, but I'd love to be wrong!

Thanks for all your hard work, all.