Annotate button fails with error when specific phrase is highlighted

robertknight commented 8 years ago

Steps to reproduce

Go to http://jonudell.net/h/mueller_01.pdf
Select the word "Chomsky", and only that word, in the first paragraph. Note that making this selection precisely requires fine mouse control to avoid selecting the whole paragraph
Click the "Annotate" button
Expected behaviour

The word "Chomsky" should be highlighted and a new annotation should be created.

Actual behaviour

No new annotation is created. An error appears in the dev console:

Uncaught TypeError: Cannot read property 'classList' of null
    at getNodeTextLayer (chrome-extension://bjfhmglciegochdpefhhlphglcehbmek/public/scripts/injector.bundle.js?bb3e58:683:24)
    at Object.exports.describe (chrome-extension://bjfhmglciegochdpefhhlphglcehbmek/public/scripts/injector.bundle.js?bb3e58:934:20)
    at getSelectors (chrome-extension://bjfhmglciegochdpefhhlphglcehbmek/public/scripts/injector.bundle.js?bb3e58:1526:29)
    at Array.map (native)
    at PdfSidebar.module.exports.Guest.createAnnotation (chrome-extension://bjfhmglciegochdpefhhlphglcehbmek/public/scripts/injector.bundle.js?bb3e58:1550:36)
    at PdfSidebar.module.exports.Sidebar.createAnnotation (chrome-extension://bjfhmglciegochdpefhhlphglcehbmek/public/scripts/injector.bundle.js?bb3e58:3043:40)
    at Object.onAnnotate (chrome-extension://bjfhmglciegochdpefhhlphglcehbmek/public/scripts/injector.bundle.js?bb3e58:1179:14)
at Adder.handleCommand (chrome-extension://bjfhmglciegochdpefhhlphglcehbmek/public/scripts/injector.bundle.js?bb3e58:426:15)

Browser/system information

Chrome v54.0.2816.0 Hypothesis v0.44

sean-roberts commented 8 years ago

After some research, here are my findings:

The Chomsky referred to above has two elements that are in that area. The plain text and a pdf link that is an overlay type of element.
The error is triggering in pdf.coffee but it's because a null value is being sent to the function instead of a Node.
The function that is unexpectedly grabbing a null value is in annotator.js around line 621 nr.start = r.start.nextSibling;
r.start.nextSibling is null for this element. I don't know why exactly. However, this is actually the startContainer that the selection range is returning. And if it does not have access to the nextSibling then that totally fails the root assumption that the code makes.

I don't know what the thoughts are on modifying annotatorjs, but I am going to do it to at least have something to point at in code for what my suggestions are.

sean-roberts commented 8 years ago

@robertknight can you let me know your thoughts on the above + the change?

robertknight commented 8 years ago

I don't know what the thoughts are on modifying annotatorjs, but I am going to do it to at least have something to point at in code for what my suggestions are.

Upstream the source code for BrowserRange has been removed and the functionality for converting between XPaths and Range objects has been moved into xpath-range, which is conceptually similar to the dom-anchor-* libraries that we use for anchoring quote and position selectors. Ultimately what I'd like to do is replace this code from Annotator with xpath-range which should be easier to hack on.

However, to start with it should be possible to create a test case in anchoring/html/html-test.js which reproduces the problem here - since the BrowserRange#normalize method is also called when resolving XPath RangeSelector selectors to Range objects. Once that is done, we can modify the Annotator code to make the test pass, then later swap out the implementation.

robertknight commented 8 years ago

Stepping back one level - it isn't completely clear why the range needs to be normalized using the code in Annotator in the first place. My guess is that it is because getNodeTextLayer() requires its argument to be an Element rather than just a Node. If we can figure out what invariants the rest of the describe() function in pdf.coffee requires of the Range, another way we could approach this might be able to replace the call to BrowserRange#normalize with something simpler.

For reference, here is the original CoffeeScript code for BrowserRange with comments which explains a bit more amount what normalization is for - but bear in mind that this code was written several years ago to cope with differences in how browsers represented selections at the time.

judell commented 5 years ago

@klemay, @jeremydean here's another thing to be aware of if you weren't already. PDF has its own notion of annotations. I don't think they are very common, but when they exist, as in this example, they interfere with our mechanism.

hypothesis / client