w3c / web-annotation

Web Annotation Working Group repository, see README for links to specs
https://w3c.github.io/web-annotation/
Other
142 stars 30 forks source link

Support mixture of Range Selector, node index and text position #448

Open Jeffxz opened 2 years ago

Jeffxz commented 2 years ago

The use case is to let web browser easily serialize and highlight user selected text (to simply use dom range startNode and endNode ).

Imaging we need to create highlight for text fragment "select me" in this HTML element fragment

<p id="header">see <span>if</span>you can highlight me</p>

I propose a selector data model like this

{
  "@context": "http://www.w3.org/ns/anno.jsonld",
  "id": "http://example.org/anno28",
  "type": "Annotation",
  "body": "http://example.org/comment1",
  "target": {
    "source": "http://example.org/page1.html",
    "selector": {
      "type": "RangeSelector",
      "startSelector": {
        "type": "CSSSelector",
        "value": "#header"
        "startNode": "2" // node index inside #header element
        "startOffset": "9" // charOffset inside startNode
      },
      "endSelector": {
        "type": "CSSSelector",
        "value": "#header"
        "endNode": "2" // node index inside #header element
        "endOffset": "20" // charOffset inside startNode
      }
    }
  }
}

Currently we have Range Selector and Text Position selector. But to let web browser or web browser based reading system to easily serialize selector and highlight the selected segment (for example text) only Range Selector (use CSS Selector) and Text Position Selector are not enough.

Use Range Selector we ony can let select HTML element (#header in above case)

Use Text Position Selector we have to assume the whole html document is serialized by text (in above example it has to be "see if you can highlight me" with index of "h" and "e" in the whole string) but then it is still difficult to locate html element and node for the target text for highlighting purpose.

Jeffxz commented 2 years ago

Here is a POC to use selector as above to highlight several text inside a random book. In following example I serialized startSelector to format of <value>;<startNode>;<startOffset> and endSelector to <value>;<endNode>;<endOffset>

https://wysebee.com/run?epub=https://content.wysebee.io/moby-dick.epub#chapter=10&startSelector=section%20%3E:nth-child(6n);0;36&endSelector=section%20%3E:nth-child(6n);0;56

tilgovi commented 2 years ago

@Jeffxz are you asking for something that's not covered by refinedBy? I think the expected way to handle this is to refine the CSS selector by a text position selector so that the position is relative to the start of #header.

There are still some difficulties with measuring offsets that may cross node boundaries or that include text nodes that have code points with multiple code units, so it's not as simple as creating a DOM Range with the offset.