Closed mipyykko closed 1 year ago
A lot of this was fixed by fiddling with the database settings; @nygrenh also did some change that fixed some answers getting humongous amounts of reviews and also limited the amount returned to a single user. Some peer review queries probably remain that will return all, but they are probably supposed to do that and are much faster now anyway.
For example, one specific question answer has over 10k peer reviews.
This
https://github.com/rage/quizzes/blob/00732137e6889a1913bc2a6e0c2f460a0a3ae578/packages/backendv2/src/models/peer_review.ts#L51-L53
produces a query
that, when specifying the quiz answer with the most peer reviews, produces +40k rows (4 questions in a peer review collection).
The query analyzed (
EXPLAIN ANALYZE ...
) produces the following:So, it has to go sequentially through the
peer_review_question_answer
table to find all the answers. The table does have an index, which is actually used in the query when the expected number of joins is low, but on these kinds of occasions the planner picks the sequential scan because the cost is lower - even when using the index would be a lot faster.This query is called quite often (96k times in the last week) and the average rows it returns is 1726, which I think is quite a lot especially if it is only used in the widget. The query is currently the top query ranked by the total time used.
The function that runs the query is called in only one place, mapping to the API endpoint
/widget/answers/:answerId/peer-reviews
, which is in turn used by the widget to load received peer reviews for a user. Don't know if it's really useful for a single user to get more than 10 000 peer reviews (or even 1 000) loaded in a widget, at least in one go.Maybe implement paging or something that would limit the rows returned, or just limit the peer reviews loaded here in any case to some magic number? The query could also be refactored in some way, maybe to flip the join direction or something just to see if it helps.
edit: This is not specific to just this one function; a simple query like
select * from peer_review where quiz_answer_id = $1
is also slow when the number is large. Queries related to this are top 4 when ranked by total time used - these four queries take up 71% of all the time recorded so far! So, everywhere the peer reviews are queried byquiz_answer_id
should be checked to see if we really need that many rows, or to find some other solution.