hypothesis / client

The Hypothesis web-based annotation client.
Other
641 stars 197 forks source link

Sidebar displays annotation in original (not current) page order #248

Open nickstenning opened 7 years ago

nickstenning commented 7 years ago

Originally reported as https://github.com/hypothesis/h/issues/2868 by @judell.

Here's a [handy test page] which can be used to repro this issue.

Steps to reproduce

  1. Annotate unique strings in each paragraph.
  2. Reorder the paragraphs in the source.
  3. Reload Hypothesis.

Expected behaviour

Annotations reanchor, and are displayed in the sidebar in the order they are now found in the page.

Actual behaviour

Annotations do successfully reanchor (assuming you selected an identifiable part of the text), but are displayed in the sidebar in the order they were in when the page was first annotated.

Additional details

Note that this doesn't just affect the situation when page content is explicitly reordered. It also applies when changes occur around the annotated content and annotations are made before and after these changes, i.e.

  1. Content is annotated.
  2. Big chunk of ad copy or whatever is inserted before annotated content.
  3. Content is annotated again.

The annotations created in step 3 will have substantially larger offsets in the TextPositionSelector, and so will likely all be displayed after the annotations created in step 1, regardless of their actual positions in the annotated content.


@judell provides some real-world examples:

At: http://www.nytimes.com/2015/10/24/opinion/the-patent-troll-smokescreen.html?_r=0

The following appear 1, 2 in the doc, 2, 1 in the sidebar.

https://hyp.is/AVI3AAy28sFu_DXLVbLQ/www.nytimes.com/2015/10/24/opinion/the-patent-troll-smokescreen.html

https://hyp.is/AVIthELH8sFu_DXLVZl7/www.nytimes.com/2015/10/24/opinion/the-patent-troll-smokescreen.htmlhttp://www.nytimes.com/2015/10/24/opinion/the-patent-troll-smokescreen.html?_r=0

nickstenning commented 7 years ago

@robertknight on the original issue:

Sorting of annotations in the sidebar is done based on the stored start and end positions of the TextPositionSelector selector. The annotation of the text "Is the University of Wisconsin-Madison a patent troll" does not show up at the top of the list in location order because it has a larger start offset for its TextPositionSelector (see https://hypothes.is/api/annotations/AVIlMa9V8sFu_DXLVY_y ) than the annotations which follow.

When I created a new annotation of the same text just now, it had a much lower start offset than the stored value.

One potential way we could fix this would be to calculate the TextPositionSelector to annotations after they are anchored in the page and use those instead of the offsets originally stored with the annotation. Whether this is feasible or not depends on how expensive it is to calculate the position offset.

ajpeddakotla commented 7 years ago

possibly related: #174

judell commented 7 years ago

I suspect there's been prior discussion not only of recalculating, but also perhaps updating, the text position. That likely would've been before my arrival at H though. @nickstenning or @dwhly might know where that discussion can be found. (Also @tilgovi!)

dwhly commented 7 years ago

One potential way we could fix this would be to calculate the TextPositionSelector to annotations after they are anchored in the page

Given that one of our big value propositions is the fuzzy anchoring bit-- in other words, working with the way pages are now vs the way they were before, I think having location sort be based off of the current page rather than its state at annotation creation is probably fairly important.

Stepping through the climate feedbacks here: http://climatefeedback.org/feedbacks/ it seems that approximately 50% of them have anomalous sorts on a casual check.

robertknight commented 7 years ago

Stepping through the climate feedbacks here: http://climatefeedback.org/feedbacks/ it seems that approximately 50% of them have anomalous sorts on a casual check.

In which case I'd suggest creating a separate task to investigate some of these and verify that the problem is the same as the one being discussed here.

FWIW it should be perfectly possible to re-sort annotations based on their position within the text after anchoring, once that position is known. There will be some work involved in threading that data through the necessary messages back from the page to the sidebar.

dwhly commented 7 years ago

I suspect there's been prior discussion not only of recalculating, but also perhaps updating, the text position.

I remember these discussions... Taking advantage of new knowledge, storing that knowledge and thus speeding up our response on subsequent queries seemed at face value to be a good idea. Doing so might even allow us to open up our range for detecting relocations of annotations (I think we limit this range now because looking far and wide in the DOM for errant annotations is expensive, and if we're doing it every time and never learning then it's a problem, but maybe not if we only do it the first time).

We did however note the many landmines you might find-- situations where you were updating positions between different views, or different renderings of a page in a way that wasn't helpful-- i.e. flip flopping and overwriting locations alternatively between HTML and PDF views depending on which format was viewed last. Or if a page is partially paywalled for one user and not another.

judell commented 6 years ago

Another example:

image

Instead of replying to MrsTucker, I decided to notify Nate and Jeremy, expecting our annotations on the same occurrence of you're to hang together. But they don't.

I've made a debug client that reports both the stored XPath (two lines of data, for the start and end of the DOM range, in this case the same because the selection does not cross element boundaries) and the stored position as start, stop.

image

The pattern on the left is what we see through sometime in 2017, the pattern on the right is what we see since.

judell commented 4 years ago

See https://twitter.com/juancommander/status/1244421372983439360.

/cc @klemay

klemay commented 4 years ago

Thanks @judell - we'll discuss this one at this week's bug backlog review!

judell commented 4 years ago

Ignoring script tags might help a lot.

On a simplified version of the page Juan reported, I made an annotation on the sentence beginning "Certainly" before, then after, adding a bogus script tag. The positions differ.

image

judell commented 4 years ago

robertknight 13 hours ago Yes. The “location” is very naive. Its effectively document.body.textContent.indexOf(quote). It ignores whether text is visible, in an element that is displayed to the user or whether the element is re-ordered by CSS

robertknight commented 3 years ago

This is still applicable. We always use the recorded text position (in the TextPositionSelector selector) rather than the position of the resolved range.