Exercises are slow to Render due to high number of bookmarks

zeeguu / api

API for tracking a learner's progress when reading materials in a foreign language and recommending further personalized exercises and readings.

https://zeeguu.org

MIT License

8 stars 24 forks source link

Exercises are slow to Render due to high number of bookmarks #263

Closed tfnribeiro closed 1 week ago

tfnribeiro commented 1 month ago

When we changed the scheduled bookmarks, I didn't look too much into the performance of the top_bookmarks_to_study but I was testing a user with about 1900 total bookmarks, and it takes around 30 seconds for us to render the first exercise in the web, which made me wonder what are the total number of "fit to study" bookmarks we are working with.

{654FBCDC-77FE-4DB4-AB0A-139BE25FB0AC}

It does seem that our average user has 105 bookmarks, but there is quite a few (> 300) that have more than 200 bookmarks.

This makes me think that it might be better to limit the query and order it already from the database, rather than doing it in python. I would expect that to improve the speed of the query overall.

mircealungu commented 1 month ago

Doing everything in the DB and also adding indexes if needed should definitely be the way to go :)

tfnribeiro commented 1 month ago

Yes, I will look into it - I will try to write the query in SQLAlchemy and see how it improves the query speed,

tfnribeiro commented 1 month ago

I have worked on this further and I got to the point where I have reduced the time of the query from:

[Thu Oct 24 11:05:16.150729 2024] [wsgi:error] [pid 9:tid 140594214651584] [remote 172.19.0.1:42276] ### INFO: all_bookmarks_priority_to_study took: 14.7286 seconds, total: 1802

to:

[Thu Oct 24 11:08:21.888770 2024] [wsgi:error] [pid 9:tid 139678095840960] [remote 172.19.0.1:47454] ### INFO: all_bookmarks_priority_to_study took: 1.7743 seconds, total: 131

I did this by limiting the number of queried bookmarks to be the count asked from the endpoint , so at max it can be limit * 2 (because we get the top scheduled and top unscheduled words) and do a final sort.

This seems enough for our purposes, I'd say - what do you think?

mircealungu commented 1 month ago

one order of magnitude improvement. nobody can complain about that :)