hypothesis / client

The Hypothesis web-based annotation client.
Other
632 stars 195 forks source link

Race condition causes annotations to sometimes fail to anchor in PDFs #1330

Closed robertknight closed 5 years ago

robertknight commented 5 years ago

Steps to reproduce

Go to https://via.hypothes.is/https://www.nsqol.org/wp-content/uploads/2019/02/National-Standards-for-Quality-Online-Teaching.pdf and select the "Public" group

Expected result

There is an annotation on the last page (https://hypothes.is/a/LJMxNsjFEem55S9pNtX0Aw). It should always anchor.

Actual result

Testing in Chrome 78.0.3887.7, I see the annotation fail to anchor once every few attempts.

Inspecting the store state in the sidebar showed that no anchoring timeout had occurred but instead the annotation had just failed to anchor:

Screenshot 2019-08-27 13 27 48

Adding a breakpoint in the client code in annotator.bundle.js where anchoring failures are handled (the catch clause in response to a failed anchor - see code), I got the following exception:

Screenshot 2019-08-27 13 36 05

Which points to pdfPage being undefined here: https://github.com/hypothesis/client/blob/4dfebbd6afbd415cd0e7858e0a86ee899f4bed32/src/annotator/anchoring/pdf.js#L76

Looking at the implementation of the getPageView method in PDF.js which returns the object that is expected to have a pdfPage property, it looks like there are indeed certain states when that method could return undefined. Reading through the implementation, it looks like we should be using pdfDocument.getPage whenever we want a reference to a page, which returns a Promise<PDFPageProxy> on which we can call getTextContent directly.

robertknight commented 5 years ago

Assigning this to myself since I'd already started looking into solutions.

robertknight commented 5 years ago

To reproduce the issue with the client and Via locally more easily, apply the following diff to Via:

diff --git a/static/pdfjs/web/viewer.js b/static/pdfjs/web/viewer.js
index ca12733..4d501ab 100644
--- a/static/pdfjs/web/viewer.js
+++ b/static/pdfjs/web/viewer.js
@@ -4304,9 +4304,14 @@ var PDFViewer = (function pdfViewer() {
       var firstPagePromise = pdfDocument.getPage(1);
       this.firstPagePromise = firstPagePromise;

+      const delay = ms => new Promise(resolve => setTimeout(resolve, ms));
+
       // Fetch a single page so we can get a viewport that will be the default
       // viewport for all pages
-      return firstPagePromise.then(function(pdfPage) {
+      return firstPagePromise.then(async function(pdfPage) {
+        // TESTING
+        await delay(500);
+
         var scale = this._currentScale || 1.0;
         var viewport = pdfPage.getViewport(scale * CSS_UNITS);
         for (var pageNum = 1; pageNum <= pagesCount; ++pageNum) {

Note that this will cause a couple of unrelated errors in the console and the PDF page will not render immediately until you scroll it a bit, but it demonstrates the orphaning issue.