usgo / agagd

American Go Association Games Database (AGAGD)
https://agagd.usgo.org
MIT License
47 stars 30 forks source link

fix: vastly reduces number of db queries #252

Open abaisero opened 2 years ago

abaisero commented 2 years ago

Problem: The creation of some tables was causing a large number of queries to be run which itself scaled with the size of the database, i.e., larger databases (such as the one in production) caused more queries to be made the the db itself (a problem which componds on itself). This issue was not apparent in development because of the smaller size of the development db.

First, we can verify the above problem by increasing the size of the development db. We can do this by temporarily editing scripts/entrypoint.sh, chanding line

    python make_fake_fixtures.py 1000 1000 1000 > /tmp/fake_agagd_data.json

with

    python make_fake_fixtures.py 100 10000 100 > /tmp/fake_agagd_data.json

and then deleting and recreating the docker images/containers/volumes.

Before the change, the development database was creating data for 1000 players, 1000 games, and 1000 tournaments. Note that this meant that, on average, each player only played one game and each tournament only contained one game. This was helping mask the primary issue. After the change, the development database is creating 100 players, 10_000 games, and 100 tournaments, meaning that, on average, each player played 100 games, and each tournament contained 100 games. We note that, although this new development db is still significantly smaller than the one in production, the local development app is already having trouble responding to requests in involving player and tournament pages.

With the new larger development database in place, we observe the following concerning facts:

Solution:

This commit addresses the above issues by making 2 primary changes in how some tables are computed or rendered:

  1. The first involves the way player names and ids were rendered in the game tables using the custom agagd_core.tables.players.LinkFullMembersNameColumn. This custom column had a .render() method which internally made a query to the db. Then, for the rendering of the whole table, the .render() method was being called for every entry in the table data, causing many queries to be made. To solve this issue, we removed LinkFullMembersNameColumn altogether, and constructed the appropriate queries to construct the "player-name (player-id)" label once for all entries in the table.

  2. The second involves the way opponent and tournament data was collected to create the opponent and tournament tables in the player_profile view. This data was collected and manipulated in pure python, using explicit python loops over the Games model, and at least one db query per iteration. To solve this issue, we created appropriate queries which gathered and aggregated the required data using a couple of large queries (rather than many small ones).

After the above changes:

abaisero commented 2 years ago

I will note that this commit gets rid of a complicated part of code which was constructing the data for some tables, and reimplements similar logic using more straightforward queries. However, in doing so, some facets of the previous data-collection may have been changed. I will add some comments in the places where I know this is happening, so that somebody can review the importance of those aspects.