hypothesis / h

Annotate with anyone, anywhere.
https://hypothes.is/
BSD 2-Clause "Simplified" License
2.96k stars 427 forks source link

SPIKE: bring the realtime annotation resource into alignment with the API #5487

Open klemay opened 5 years ago

klemay commented 5 years ago

History & notes can be found here: https://github.com/hypothesis/h/issues/5486

lyzadanger commented 5 years ago

Some notes on this:

We don't have access to the request object proper deep down in the streamer app, but I believe we have some options here. As noted in https://github.com/hypothesis/h/issues/5486, there are spots in the streamer code that are already instantiating services directly (versus using the request.find_service approach). The vast majority of services don't need the whole request object. Most of them need the db session for access to the DB. We do have access to the session in streamer.

Unfortunately, the AnnotationJSONPresentation service takes a boatload of init params: https://github.com/hypothesis/h/blob/c4cd4025fa346605e84bbead7618e1613eb5d96d/h/services/annotation_json_presentation.py#L16 , each of which will need to be addressed.

I've walked through the services needed by AnnotationJSONPresentation, and most of them can be instantiated within the streamer code using the session. The most challenging exception is the request.has_permission method, which gets bandied around here. That one will take some more sleuthing.

My first-swag approach here would be along these prongs:

  1. See if there is any simplification that could be done to the AnnotationJSONPresentation service and the various formatters used to render annotations. This part of the app is deeply designed and may benefit from some simplification. The fact that the AnnotationJSONPresentation service takes so many init args is a little smelly.
  2. Figure out how to fill the gaps in request-related items and services needed by AnnotationJSONPresentation, the most immediately significant one being has_permission. Another spot is the links service, which requires some path/URL stuff off of request. An option would be not to return links with the realtime response, though really that's a symptom of the same problem we're trying to solve here overall...
  3. Adapt streamer's annotation responses to use AnnotationJSONPresentation in a similar manner to h.views.api.annotations

This will require some coordination with client code as the response would change—however, I believe these changes would be additive to the current realtime response, as at the end of the day, both the API and the streamer are using the same presenter—it's just that the API's version viaa AnnotationJSONPresentation provides more formatters to the presenter and, as such, more JSON data is glommed on.

jon-betts commented 4 years ago

It's possible to do this, but it's not possible to do it well and quickly for a few reasons:

I have a path for fixing all of this, but it's not small, and I wanted to find out what the priority of all that is. Seeing as this is a spike, I think that's all that's required.

If you want to do it right

My suggestions for fixing this (as refactoring steps, not necessarily as PRs):

The final call in websocket could then look something like this:

# Precache what you have to here:
presenter = presentation_service.get_presenter_for(annotations=[annotation], users=...)

# Iterate quickly here
for user in users:
     serial = presenter.present(annotation, user)

You could add convenience methods for other common cases.

Other observations

If you want to do it quick

Example PR: https://github.com/hypothesis/h/pull/6316

This works, but it's slower than it was before we started the optimisation work (i.e. this + optimisation = slower than before). I don't know if this will create a problem, as we rely on the fact that relatively few sockets are interested in any given message. So you might get away with it. You might also get really badly killed if we got sufficient users on a single page (a large active course or popular website). This is relatively unlikely I'd say, as any rate of annotations which might cause us trouble would probably too much for people to look at, but who knows?

If we are expecting 10x or 100x the people, this is going to be a concern.