Inconsistent appearance of refresh icon in sidebar - Canvas PDFs

Success team has been reporting inconsistent appearance of the refresh icon in the Hypothesis sidebar when new annotation have been posted to an article. See https://hypothes-is.slack.com/archives/C2BLQDKHA/p1614026929018000

Sometimes the icon appears immediately when there is a new annotation, but other times there are delays of up to several minutes reported.

To test internally, I have created 3 Canvas assignments of different document types:

For HTML and PDF Google I've seen no delay in appearance of refresh icon when new annotations are posted. However, for PDF Canvas I do not see the icon appear at all when new annotations are posted.

To Reproduce

Login to Canvas as Professor Dean
Open For testing refresh icon appearance PDF Canvas
In a separate incognito window, login to Canvas as a Model Student
Open For testing refresh icon appearance PDF Canvas
Post annotations in each user instance
The refresh icon does not appear in the sidebar of the user who has not posted the above annotation. However, manually reloading the assignment page will show all new annotations.

Expected behavior The refresh icon should appear at the top of the Hypothesis sidebar near-instantaneously when a new annotation has been posted by another user in the document being viewed.

Above tested in Chrome 88 on MacOS 11.2.1

It looks like the problem is that when the WebSocket server receives a notification that an annotation has been updated or created, it only considers the annotation's HTTP URL (annotation.target_uri) when matching the annotation against connected WebSockets and not its fingerprint. See https://github.com/hypothesis/h/blob/847a055fc125685fbd239c3b0cb9acbb3169255a/h/streamer/filter.py#L31. This is a problem for Canvas PDF files because the includes signatures and/or expiry times in various places and so change frequently.

Client A connects and configures the WebSocket to watch for annotations on (Client A's temporary HTTP URL for Canvas file, PDF fingerprint)
Client B connects and configures the WebSocket to watch for annotations on (Client B's temporary HTTP URL for Canvas file, PDF fingerprint)
Client A creates an annotation, which is associated with Client A's temporary HTTP URL (as the target_uri) and the PDF fingerprint. When saving the annotation, h dispatches a notification to WebSocket servers about the update
The WebSocket server that Client B is connected to receives the notification, fetches the annotation from storage and matches its target_uri (Client A's temporary HTTP URL) against the HTTP URLs associated with the connected Client B (Client B's temporary HTTP URL, PDF fingerprint)

In order to be consistent with the search API, what needs to happen here is that in Step 4, the WebSocket server must expand the annotation's target_uri into all equivalent URIs and match those against each of the connected WebSocket clients.

Related to this, there is some code that expands the URIs that a WebSocket client sends into equivalent URIs when the WebSocket connection is established. A problem with this is that the expansion only happens when the client initially connects and won't work properly if "new" equivalent URLs are added later.

For the specific case of Canvas PDF files and other temporary URLs, only the PDF fingerprint actually matters. If the client and h were aware of this they could optimize the matching of WebSocket messages to clients by only using this. We also have some existing URL normalization logic to take signed URLs and canonicalize them into URLs without the signature. In this scenario we could make HTTP URL matching work by changing URI normalization to ignore the token parameter in Canvas files requests. Unfortunately changing the URI normalization process is not trivial because the normalized URI is stored in the database and if we change the algorithm, then we need to update existing stored URIs.

hypothesis / support-legacy

Inconsistent appearance of refresh icon in sidebar - Canvas PDFs #173