hypothesis / client

The Hypothesis web-based annotation client.
Other
643 stars 198 forks source link

Whitespaces skipped over when highlighting books on archive.org #5711

Closed cdrini closed 1 year ago

cdrini commented 1 year ago

The Internet Archive bookreader has elements containing only whitespace inbetween words of their text layer.

Here is a sample line from https://archive.org/details/goodytwoshoes00newyiala/page/n5/mode/2up

<span class="BRlineElement" style="line-height: 87px;">
    <span class="BRwordElement" style="letter-spacing: -1.275px; padding-left: 0px;">had</span>
    <span class="BRspace" style="letter-spacing: 14.65px;"> </span>
    <span class="BRwordElement">a</span>
    <span class="BRspace" style="letter-spacing: 11.5833px;"> </span>
    <span class="BRwordElement" style="letter-spacing: 0.459998px;">moment</span>
    <span class="BRspace" style="letter-spacing: 11.85px;"> </span>
    <span class="BRwordElement" style="letter-spacing: -3.01668px;">to</span>
    <span class="BRspace" style="letter-spacing: 27.3667px;"> </span>
    <span class="BRwordElement" style="letter-spacing: -1.04334px;">spare.</span>
</span>

image image

It seems like this is the code where it happens:

https://github.com/hypothesis/client/blob/68e627e15e93f7b6ce2f0049359352acfc7f4482/src/annotator/highlighter.ts#L233-L256

cdrini commented 1 year ago

Would we be able to update this to allow whitespace-only TextNodes if they are inside a display: inline element ? White-space only display: inline elements do affect the display/should be highlighted, and should not cause issues with table rows/lists/etc.

Eg. Change:

https://github.com/hypothesis/client/blob/68e627e15e93f7b6ce2f0049359352acfc7f4482/src/annotator/highlighter.ts#L239

To:

// Note `span` is an array of adjacent TextNode.
// If these nodes are direct children of a `display: inline` element
// or if it's non-empty, we will wrap it with the highlight.
getComputedStyle(span[0].parentNode).display == 'inline' || span.some(node => !whitespace.test(node.data)),
cdrini commented 1 year ago

This also appears to affect highlight code in various places:

highlightjs: image image

codemirror: image

cdrini commented 1 year ago

Works like a charm!

image