isomerpages / isomercms-backend

A static website builder and host for the Singapore Government
5 stars 1 forks source link

ISOM-824 feat: don't fetch redundant author info from db #1255

Closed timotheeg closed 6 months ago

timotheeg commented 6 months ago

Problem

computeShaMappings() does one query this.users.findByPk(userId) for each author of all commits on the review. All the DB queries are issued in parallel. For large changeset with many commits, we expect the set of authors to be less than the number of commits, and so the process sometimes issues many redundant queries in parallel.

image image

Goal: reduce the number of queries to the minimum required.

Solution

Breaking Changes

TODO:

Notes:

timotheeg commented 6 months ago

Oh boy... A bunch of tests had incorrect arrangements that were revealed by the fetch optimizations 😰😑

The 2 constant structs MOCK_GITHUB_COMMENT_DATA_ONE and MOCK_GITHUB_COMMENT_DATA_TWO are actually referencing the same user id (1), but the tests were mocking different responses of different users with different email addresses (MOCK_GITHUB_EMAIL_ADDRESS_ONE and MOCK_GITHUB_EMAIL_ADDRESS_TWO).

When the old (inefficient) code was issuing one request per user, whether or not there were duplicates user ids, the mock responses could return different responses for each calls, regardless of whether the inbound args were actually different.

The tests passed in the sense they were using the response data for the final output, but they were internally inconsistent (i.e. user id 1 gets different user responses on successive mocked db calls)

I suggest we install jest-when to have the ability to mock responses based on the call args, so we could do:

when(MockUsersRepository.findByPk)
    .calledWith('1')
    .mockReturnValue(MOCK_GITHUB_COMMIT_AUTHOR_ONE)

when(MockUsersRepository.findByPk)
    .calledWith('2')
    .mockReturnValue(MOCK_GITHUB_COMMIT_AUTHOR_TWO)
timotheeg commented 6 months ago

Thanks for the review, Kish!

only blocking is querying the db for possibly undefined values

If you spot something that's really wrong in a PR, submit your review as Request for Changes to make sure the PR is blocked, and no one merges accidentally 😰

That being said, in this PR, I think all user queries are now protected against querying with undefined (with the userIds being checked with .filter(_.identity)), while the old code was not 😰 .