opensafely-core / opencodelists

OpenCodelists is an open platform for creating and sharing codelists of clinical terms and drugs.
https://www.opencodelists.org
Other
31 stars 11 forks source link

Consider a more performant homepage #1640

Open madwort opened 1 year ago

madwort commented 1 year ago

The homepage of this project https://www.opencodelists.org/ currently takes an average of 2.5-3 seconds to load - see honeycomb bubbleup identifying characteristics of slow pages and honeycomb average duration of homepage. This is also the page that freshping frequently requests to verify the site is online. I can see that there are advantages to having a detailed selection of data on the homepage (for instance, to show off to new visitors, and so that freshping also acts as functional test), but it might also be good for regular visitors to find a snappy, responsive site to get to the information they're looking for quicker & to reduce the load required to run freshping.

(nb. I consider 'willnotfix' a valid response to this ticket)

inglesp commented 1 year ago

Thanks. We're not going to get to this anytime soon.

Jongmassey commented 1 month ago

profiling the index view:

>>> import time
>>> start = time.time()
>>> handles = Handle.objects.filter(is_current=True, codelist__is_private=False)
>>> end = time.time()
>>> print(end-start)
0.001859903335571289
>>> start = time.time()
>>> handles = handles.order_by("name")
>>> end = time.time()
>>> print(end-start)
0.0012235641479492188
>>> start = time.time()
>>> codelists = _all_published_codelists(handles)
>>> end = time.time()
>>> print(end-start)
17.353341102600098

it's clear that L32 is the culprit.

Jongmassey commented 1 month ago

Where there is no search query or organisation_slug the index page pulls in all 7384 Handles and then iterates them, finding the codelist for each one

def _all_published_codelists(handles):
    return [
        handle.codelist
        for handle in handles
        if handle.codelist.has_published_versions()
    ]

These 7384 trips to the database are quite slow, especially when only 15 codelists are displayed on the page.

mikerkelly commented 1 month ago

This way of fetching all the data seems like a classic N+1 queries problem. Instead index.py, can probably make the DB access into a constant number of queries with a prefetch_related() call on the handles QuerySet. Something like (untested):

prefetch= Prefetch('versions', queryset=CodelistVersion.objects.filter(status="published")
handles = Handle.objects.filter(
    is_current=True, codelist__is_private=False).prefetch_related(
    'codelist', codelist_versions_prefetch))

Hopefully that would make the query much faster.

If that didn't work sufficiently well, we could either fetch less data, or none, if the search term is blank. Having some on the front page is kind of nice to enable a user to immediately click into one and see what the site is about, though. More stats and things would be nice, too... (off topic).

Jongmassey commented 1 month ago

Yes, I was also thinking a prefetch of the related codelists would be a good start. It might be worth also doing the pagination calculation before hitting the db, so we only fetch 15 records at a time

evansd commented 1 month ago

Worth noting that the database in question is SQLite, where the N+1 problem doesn't occur in quite the same way. But yeah, doing less stuff certainly seems like a good idea!

lucyb commented 5 days ago

There's also some more information in the duplicate issue #1857 (now closed).