hypothesis / support-legacy

a place for tracking support-related work and projects
3 stars 0 forks source link

Inconsistent appearance of refresh icon in sidebar - Canvas PDFs #173

Closed mattdricker closed 3 years ago

mattdricker commented 3 years ago
Screen Shot 2021-02-23 at 10 09 23 AM

Success team has been reporting inconsistent appearance of the refresh icon in the Hypothesis sidebar when new annotation have been posted to an article. See https://hypothes-is.slack.com/archives/C2BLQDKHA/p1614026929018000

Sometimes the icon appears immediately when there is a new annotation, but other times there are delays of up to several minutes reported.

To test internally, I have created 3 Canvas assignments of different document types:

For HTML and PDF Google I've seen no delay in appearance of refresh icon when new annotations are posted. However, for PDF Canvas I do not see the icon appear at all when new annotations are posted.

To Reproduce

  1. Login to Canvas as Professor Dean
  2. Open For testing refresh icon appearance PDF Canvas
  3. In a separate incognito window, login to Canvas as a Model Student
  4. Open For testing refresh icon appearance PDF Canvas
  5. Post annotations in each user instance
  6. The refresh icon does not appear in the sidebar of the user who has not posted the above annotation. However, manually reloading the assignment page will show all new annotations.

Expected behavior The refresh icon should appear at the top of the Hypothesis sidebar near-instantaneously when a new annotation has been posted by another user in the document being viewed.

Above tested in Chrome 88 on MacOS 11.2.1

robertknight commented 3 years ago

It looks like the problem is that when the WebSocket server receives a notification that an annotation has been updated or created, it only considers the annotation's HTTP URL (annotation.target_uri) when matching the annotation against connected WebSockets and not its fingerprint. See https://github.com/hypothesis/h/blob/847a055fc125685fbd239c3b0cb9acbb3169255a/h/streamer/filter.py#L31. This is a problem for Canvas PDF files because the includes signatures and/or expiry times in various places and so change frequently.

  1. Client A connects and configures the WebSocket to watch for annotations on (Client A's temporary HTTP URL for Canvas file, PDF fingerprint)
  2. Client B connects and configures the WebSocket to watch for annotations on (Client B's temporary HTTP URL for Canvas file, PDF fingerprint)
  3. Client A creates an annotation, which is associated with Client A's temporary HTTP URL (as the target_uri) and the PDF fingerprint. When saving the annotation, h dispatches a notification to WebSocket servers about the update
  4. The WebSocket server that Client B is connected to receives the notification, fetches the annotation from storage and matches its target_uri (Client A's temporary HTTP URL) against the HTTP URLs associated with the connected Client B (Client B's temporary HTTP URL, PDF fingerprint)

In order to be consistent with the search API, what needs to happen here is that in Step 4, the WebSocket server must expand the annotation's target_uri into all equivalent URIs and match those against each of the connected WebSocket clients.

Related to this, there is some code that expands the URIs that a WebSocket client sends into equivalent URIs when the WebSocket connection is established. A problem with this is that the expansion only happens when the client initially connects and won't work properly if "new" equivalent URLs are added later.

For the specific case of Canvas PDF files and other temporary URLs, only the PDF fingerprint actually matters. If the client and h were aware of this they could optimize the matching of WebSocket messages to clients by only using this. We also have some existing URL normalization logic to take signed URLs and canonicalize them into URLs without the signature. In this scenario we could make HTTP URL matching work by changing URI normalization to ignore the token parameter in Canvas files requests. Unfortunately changing the URI normalization process is not trivial because the normalized URI is stored in the database and if we change the algorithm, then we need to update existing stored URIs.