Open madwort opened 1 year ago
Thanks. We're not going to get to this anytime soon.
profiling the index view:
>>> import time
>>> start = time.time()
>>> handles = Handle.objects.filter(is_current=True, codelist__is_private=False)
>>> end = time.time()
>>> print(end-start)
0.001859903335571289
>>> start = time.time()
>>> handles = handles.order_by("name")
>>> end = time.time()
>>> print(end-start)
0.0012235641479492188
>>> start = time.time()
>>> codelists = _all_published_codelists(handles)
>>> end = time.time()
>>> print(end-start)
17.353341102600098
it's clear that L32 is the culprit.
Where there is no search query or organisation_slug
the index page pulls in all 7384 Handles
and then iterates them, finding the codelist for each one
def _all_published_codelists(handles):
return [
handle.codelist
for handle in handles
if handle.codelist.has_published_versions()
]
These 7384 trips to the database are quite slow, especially when only 15 codelists are displayed on the page.
This way of fetching all the data seems like a classic N+1 queries problem. Instead index.py, can probably make the DB access into a constant number of queries with a prefetch_related() call on the handles QuerySet. Something like (untested):
prefetch= Prefetch('versions', queryset=CodelistVersion.objects.filter(status="published")
handles = Handle.objects.filter(
is_current=True, codelist__is_private=False).prefetch_related(
'codelist', codelist_versions_prefetch))
Hopefully that would make the query much faster.
If that didn't work sufficiently well, we could either fetch less data, or none, if the search term is blank. Having some on the front page is kind of nice to enable a user to immediately click into one and see what the site is about, though. More stats and things would be nice, too... (off topic).
Yes, I was also thinking a prefetch of the related codelists would be a good start. It might be worth also doing the pagination calculation before hitting the db, so we only fetch 15 records at a time
Worth noting that the database in question is SQLite, where the N+1 problem doesn't occur in quite the same way. But yeah, doing less stuff certainly seems like a good idea!
There's also some more information in the duplicate issue #1857 (now closed).
The homepage of this project https://www.opencodelists.org/ currently takes an average of 2.5-3 seconds to load - see honeycomb bubbleup identifying characteristics of slow pages and honeycomb average duration of homepage. This is also the page that freshping frequently requests to verify the site is online. I can see that there are advantages to having a detailed selection of data on the homepage (for instance, to show off to new visitors, and so that freshping also acts as functional test), but it might also be good for regular visitors to find a snappy, responsive site to get to the information they're looking for quicker & to reduce the load required to run freshping.
(nb. I consider 'willnotfix' a valid response to this ticket)