SPIKE: bring the realtime annotation resource into alignment with the API

klemay commented 5 years ago

History & notes can be found here: https://github.com/hypothesis/h/issues/5486

lyzadanger commented 5 years ago

Some notes on this:

We don't have access to the request object proper deep down in the streamer app, but I believe we have some options here. As noted in https://github.com/hypothesis/h/issues/5486, there are spots in the streamer code that are already instantiating services directly (versus using the request.find_service approach). The vast majority of services don't need the whole request object. Most of them need the db session for access to the DB. We do have access to the session in streamer.

Unfortunately, the AnnotationJSONPresentation service takes a boatload of init params: https://github.com/hypothesis/h/blob/c4cd4025fa346605e84bbead7618e1613eb5d96d/h/services/annotation_json_presentation.py#L16 , each of which will need to be addressed.

I've walked through the services needed by AnnotationJSONPresentation, and most of them can be instantiated within the streamer code using the session. The most challenging exception is the request.has_permission method, which gets bandied around here. That one will take some more sleuthing.

My first-swag approach here would be along these prongs:

See if there is any simplification that could be done to the AnnotationJSONPresentation service and the various formatters used to render annotations. This part of the app is deeply designed and may benefit from some simplification. The fact that the AnnotationJSONPresentation service takes so many init args is a little smelly.
Figure out how to fill the gaps in request-related items and services needed by AnnotationJSONPresentation, the most immediately significant one being has_permission. Another spot is the links service, which requires some path/URL stuff off of request. An option would be not to return links with the realtime response, though really that's a symptom of the same problem we're trying to solve here overall...
Adapt streamer's annotation responses to use AnnotationJSONPresentation in a similar manner to h.views.api.annotations

This will require some coordination with client code as the response would change—however, I believe these changes would be additive to the current realtime response, as at the end of the day, both the API and the streamer are using the same presenter—it's just that the API's version viaa AnnotationJSONPresentation provides more formatters to the presenter and, as such, more JSON data is glommed on.

jon-betts commented 4 years ago

It's possible to do this, but it's not possible to do it well and quickly for a few reasons:

Our current "presenter" service is really setup to serialise multiple annotations for a single user
It's also configured that that single user is the request user (which we can work around, but is awkward)
It's not setup to efficiently send one annotation to multiple users (which is what we need to do)
It has optimisations for pre-caching work for the one user, many annotations setup, but none for the other way around
This means using it for that is slow, as it issues many requests to the DB

I have a path for fixing all of this, but it's not small, and I wanted to find out what the priority of all that is. Seeing as this is a spike, I think that's all that's required.

If you want to do it right

My suggestions for fixing this (as refactoring steps, not necessarily as PRs):

Concentrate on sanity first:
- Move all of the formatters and the presenter and the service into a single service directory (they are all private to it)
- Remove the interface stuff
- Stop sending formatters to the presenter to apply, and just apply them yourself
- Move the user is None check in the flagged service
- Have a single is_moderator check instead of doing it two different ways
- Stop using the moderation service to check if something is moderated (it's an attribute on the annotation!)
Refactor the interface to accept an annotation to present, and a user to present it to (instead of being provided a fixed user at construction time)
Inline the formatters:
- Separate the caching behavior and the formatting behavior
- Inline all of the formatting behavior into the service (it was about 30 lines when I did it)
- Make it not cache for now
- Partial example here: https://github.com/hypothesis/h/pull/6317
Change the cardinality and introduce caching objects
- Create more flexible caching objects for answering whether annotations are flagged by a user etc.
- Make it so these can pre-cache what they need to to answer the questions
- Make it so these caches work for either many users or many annotations (or both)
- Push those elements down into the relevant services so anything can use them (if possible)
- Change the cardinality of the presentation functions to accepts lists of users or annotations

The final call in websocket could then look something like this:

# Precache what you have to here:
presenter = presentation_service.get_presenter_for(annotations=[annotation], users=...)

# Iterate quickly here
for user in users:
     serial = presenter.present(annotation, user)

You could add convenience methods for other common cases.

Other observations

The flag count service is private to this too
The flag service is very close, there's only one create() call outside it
Worth checking the other services

If you want to do it quick

Add a way of fooling our services to think that the user for the socket we are looking at is the current user
Recreate the service for every socket
Serialise from scratch each time

Example PR: https://github.com/hypothesis/h/pull/6316

This works, but it's slower than it was before we started the optimisation work (i.e. this + optimisation = slower than before). I don't know if this will create a problem, as we rely on the fact that relatively few sockets are interested in any given message. So you might get away with it. You might also get really badly killed if we got sufficient users on a single page (a large active course or popular website). This is relatively unlikely I'd say, as any rate of annotations which might cause us trouble would probably too much for people to look at, but who knows?

If we are expecting 10x or 100x the people, this is going to be a concern.

hypothesis / h