recogito / recogito-js

A JavaScript library for text annotation
BSD 3-Clause "New" or "Revised" License
369 stars 43 forks source link

retrieving word IDs from highlighted text? #14

Closed alhuber1502 closed 3 years ago

alhuber1502 commented 3 years ago

My text looks like this in the browser: So chearful and sprightly, she ...

and like this in the source:

<w xml:id="peh89-239170">So</w>
<c> </c>
<w xml:id="peh89-239180">chearful</w>
<c> </c>
<w xml:id="peh89-239190">and</w>
<c> </c>
<w xml:id="peh89-239200">sprightly</w>
<pc xml:id="peh89-239210">,</pc>
<c> </c>
<w xml:id="peh89-239220">she</w>

When I highlight the above line in the browser, can I retrieve "target": [peh89-239170, peh89-239180, peh89-239190, peh89-239200, peh89-239210, peh89-239220] or something to that effect? Thanks!

rsimon commented 3 years ago

There's no support for custom markup I'm afraid. I am hoping to support XPointer-based targets at some point (including for the TEI markup using the CETEIcean library) but I don't know when that will happen. (My main personal interest (as well as my paid jobs...) are focused more around image annotation at the moment.)

alhuber1502 commented 3 years ago

Thanks, can I hack this? Any pointers to where in the code to start?

rsimon commented 3 years ago

Should be hackable in principle. Devil might be in the details. But the code that translates DOM ranges to annotation targets is here:

https://github.com/recogito/recogito-client-core/blob/main/src/selection/SelectionUtils.js#L30-L51

If you want the targets to restore properly, too, you'd need to dig deeper, starting here:

https://github.com/recogito/recogito-client-core/blob/main/src/highlighter/Highlighter.js#L43

But that might not be necessary. If you just add your own ID selector in parallel to the char offset selector (rather than replace the char offset target with the ID-based one), the annotation would still restore properly when you load it.

alhuber1502 commented 3 years ago

Thanks, that's great, I'll have a stab at it.