Too wide highlights on the block-level elements using `canvas`

oleksandr-danylchenko commented 8 months ago

Issue

When testing the selection I noticed a weird selection quirk that makes the highlighter light up the whole row of the block-level element Such an issue happens only when you select something before the block-level element and then go over it. When the selection starts on the element - the highlights work fine 👌🏻

Expected selection behavior:

https://github.com/recogito/text-annotator-js/assets/68850090/b38be69f-7827-43f0-9f26-77b776043932

Too wide selection on the block-level elements

https://github.com/recogito/text-annotator-js/assets/68850090/1b330554-3d24-4e6e-a394-b44c0977c538

https://github.com/recogito/text-annotator-js/assets/68850090/fc2596fd-81bc-4c1e-9dbe-e1a9aa4c357d

Exceptions

For unknown reasons, such an issue doesn't happen for the following block elements:

p
h1, although h2-h6 are also susceptible to the issue
fieldset
table

Reproduction

The issue is reproducible on the attached plain HTML example for the text-annotator. I also made a branch with a list of all the block-level elements in that HTML example - https://github.com/recogito/text-annotator-js/pull/73. It's worth mentioning that the issue happens only when you're using the canvas.

oleksandr-danylchenko commented 8 months ago

I suppose that this issue might be related to the reason why the getClientRectsPonyfill was introduced - https://github.com/recogito/text-annotator-js/pull/33#issuecomment-1951018591

Because Firefox has the same exact overhighlighting issue! But it has its own set of exceptions that only contains the table 🤷🏻‍♂️

UPD: I tried forcefully using the getClientRectsPonyfill, but it leads to even more broken behavior both in Firefox & Chrome 😄

https://github.com/recogito/text-annotator-js/assets/68850090/4708049f-1057-4d83-8a30-7b92ea63f3f0

https://github.com/recogito/text-annotator-js/assets/68850090/5b32ef69-eb2a-4a82-9c7a-b7cbe324e329

rsimon commented 7 months ago

Revisiting this (since I finally have a bit more time now). Yes, the ponyfill was originally intended to remedy this issue. Although it's currently buggy (and may in fact never work with good performance - at least not on large and/or deeply nested markup).

Can you elaborate what you mean by:

the issue happens only when you're using the canvas.

Do you mean the issue does not happen when using the CSS Custom Highlight renderer? Or that the native browser selection looks ok, while the Canvas rendering has problems?

As for a solution: TBH I'm currently out of ideas. It seems to be a more serious problem on FF rather than Chrome. I'd definitely like to get rid of it. But right now... not quite sure how to address it.

rsimon commented 7 months ago

P.S.: as you have probably seen, my initial approach to getting "clean" highlights (in the ponyfill) was to create a copy of the annotated text nodes, temporarily wrap each text node in a SPAN, then get the bounding boxes from the SPANs instead. In theory, this would give you accurate highlight bounds that cover only the text. However, it's crazy inefficient, and breaks all sorts of stuff.

In hindsight, I believe it might have been sufficient to just split the range instead. I.e.

get all the text nodes in the range
create separate ranges for start/end + all full text nodes in between
get the bounds of each individual range

Again, in theory, this might give us accurate highlight bounds, might work on Chrome as well as FF, and would be much less disruptive.

This, along with an improved merge function, are the two main things I have on my todo list. But it will depend somewhat on my current client's wishlist if/when I can work on it.

oleksandr-danylchenko commented 7 months ago

Do you mean the issue does not happen when using the CSS Custom Highlight renderer?

Yep, I meant that one. The issue happens for the canvas renderer when the selection goes over the aforementioned block elements

Perhaps an alternative & at least slightly less disruptive solution would be to query the text nodes in the range, and then split the range, into one range per text node

Yeah, that also might be an option [^1]

[^1]: In that case, the creation of the annotations definitely should be moved to the pointerup event to prevent a significant decrease in the selection performance

rsimon commented 7 months ago

The issue happens for the canvas renderer

Ok, this actually confirms my impression. As weird as it is, here's my observation:

Ranges themselves usually work as you'd expect (= no weird block artefacts - although I've seen "blocky" selections in FF, too).
Where FF and Chrome seem to differ is when you call range.getClientRects(). The rect coordinates do differ from what the range actually looks like! The CSS Custom Highlight API styles the range itself. The Canvas renderer has to rely on the results of .getClientRects() to render the highlights. Which is why the former produces better results than the latter in this case.

rsimon commented 7 months ago

P.S.: You may have seen that I added yet another renderer implementation, based on absolutely positioned SPANs. This solves a bunch of issues for us (and perhaps should have been the default solution all along). But since that's also based on querying .getClientRects() it suffers the same problem.

rsimon commented 2 months ago

So about this issue...

I'm in the process of revising the mergeClientRects code, which is supposed to generate a "clean" set of highlight rectangles from the mess that range.getClientRects() might return. I've now tried a different approach.

Group all client rects into "lines", i.e. groups where each rect has about the same top and bottom coordinates. (Allow for a few px of tolerance).
Merge each line into a single DOMRect. (The bounding box of all rects inside this line.)
Remove the block level boxes: these would fully contain at least one other line, and would generally be wider.

My initial gut feeling is that this would be sufficient - and significantly easier (and probably more performant) than splitting the range as discussed above.

I might think different after I slept over it, and there are surely a few situations where these simple rules might break. (Rotated text?) But I'll give that a try...

rsimon commented 2 months ago

This doesn't look so bad.

Before (broken)

https://github.com/user-attachments/assets/68972f25-87a6-4bdc-8b01-9d3f582630bf

After (Chrome)

https://github.com/user-attachments/assets/b684ca3d-13c2-44d0-aecb-b3fb99627657

Firefox drops almost all of the "good" lines, too. Still need to investigate that. But might simply be down to things like rounding errors in the coordinates.

rsimon commented 2 months ago

Now also with Firefox! (I forgot that FF, as opposed to Chrome, has the habit of generating client rects with a height of 0px. That filtered out lines which would "contain" a zero-rect. After removing zero-size rects, all looks fine.)

https://github.com/user-attachments/assets/61369dca-5a9f-45d2-aa8d-f7bc213f7175

rsimon commented 2 months ago

🤦 Of course there are always edge cases. This is a CSS text-indent rule which breaks my approach by shifting the text element outside of its containing block element. So, in the long run, the approach of splitting the range into start rect, end rect, and all text nodes in between my be necessary after all. But I'm inclined to add the simple filtering approach as a temporary fix nonetheless.

recogito / text-annotator-js