Closed ezekg closed 2 months ago
Here are some queries that seem to perform well, and are able to be indexed efficiently:
-- licenses for user
select licenses.* from licenses left outer join license_users on license_users.license_id = licenses.id and license_users.user_id = '598840ca-c529-40fd-9d9b-fe650619726a' and license_users.account_id = 'f98c3ce2-9373-498a-813a-d74b1506f6e7' where licenses.account_id = 'f98c3ce2-9373-498a-813a-d74b1506f6e7' and (licenses.user_id = '598840ca-c529-40fd-9d9b-fe650619726a' or license_users.user_id = '598840ca-c529-40fd-9d9b-fe650619726a') order by licenses.created_at desc limit 10;
-- users for product
select distinct users.* from users left outer join license_users on license_users.user_id = users.id and license_users.account_id = 'f98c3ce2-9373-498a-813a-d74b1506f6e7' inner join licenses on (licenses.user_id = users.id or licenses.id = license_users.license_id) and licenses.account_id = 'f98c3ce2-9373-498a-813a-d74b1506f6e7' inner join policies on policies.id = licenses.policy_id and policies.account_id = 'f98c3ce2-9373-498a-813a-d74b1506f6e7' where users.account_id = 'f98c3ce2-9373-498a-813a-d74b1506f6e7' and policies.product_id = '00dda7a7-deb7-4a76-8a84-2be19adde374' order by users.created_at desc limit 10;
-- users for product (denormalized)
select distinct users.* from users left outer join license_users on license_users.user_id = users.id and license_users.account_id = 'f98c3ce2-9373-498a-813a-d74b1506f6e7' inner join licenses on (licenses.user_id = users.id or licenses.id = license_users.license_id) and licenses.account_id = 'f98c3ce2-9373-498a-813a-d74b1506f6e7' where users.account_id = 'f98c3ce2-9373-498a-813a-d74b1506f6e7' and licenses.product_id = '00dda7a7-deb7-4a76-8a84-2be19adde374' order by users.created_at desc limit 10;
-- users for license
select distinct users.* from users left outer join license_users on license_users.user_id = users.id and license_users.account_id = 'f98c3ce2-9373-498a-813a-d74b1506f6e7' inner join licenses on (licenses.user_id = users.id or licenses.id = license_users.license_id) and licenses.account_id = 'f98c3ce2-9373-498a-813a-d74b1506f6e7' where users.account_id = 'f98c3ce2-9373-498a-813a-d74b1506f6e7' and licenses.id = 'aa8ac994-a3ed-4c06-89fd-4c11736b6c55' order by users.created_at desc limit 10;
-- users for machine
select distinct users.* from users left outer join license_users on license_users.user_id = users.id and license_users.account_id = 'f98c3ce2-9373-498a-813a-d74b1506f6e7' inner join licenses on (licenses.user_id = users.id or licenses.id = license_users.license_id) and licenses.account_id = 'f98c3ce2-9373-498a-813a-d74b1506f6e7' inner join machines on machines.license_id = licenses.id and machines.account_id = 'f98c3ce2-9373-498a-813a-d74b1506f6e7' where users.account_id = 'f98c3ce2-9373-498a-813a-d74b1506f6e7' and machines.id = 'db10ee10-a1ac-41c4-a7c9-332a534fe398' order by users.created_at desc limit 10;
So I'm thinking that the denormalization part of this PR should be extracted into a separate PR. This is so that we can rollback this PR if things don't work out as planned (i.e. performance sucks in some area we haven't tested), without causing issues with the rollback (i.e. licenses.product_id
having a NOT NULL
constraint, causing old code to break).
Either that or we wait to add the NOT NULL
constraint until later…
Edit: went with the simpler option for now.
Edit: went with the safer option. Really not looking forward to the conflicts…
Closes #534. Reverted #774 due to performance issues with
union_of
.The issues are mainly i.r.t.
INNER JOIN ... ON id IN (SELECT id FROM ...)
joins. For large accounts, performance is just not where I want it. E.g. an account with 600k licenses takes 2s to load a user's machines, because we have to join across 600k licenses to see if the user has any associated licenses with other users, so that we can list the user's teammates' machines as well as the user's machines. The same issue is present when listing a user's licenses, and a user's teammates.I tried optimizing a few queries by hand, but there's just too many joins on the users union association. We're going to need to optimize
union_of
itself so that we don't do anJOIN ... ON id IN (SELECT id FROM ...)
, since that could potentially select hundreds of thousands or even millions of IDs, which is not memory efficient (not to mention slow).This is what I get for skipping performance testing.
Prerequisites
Pre-deploy
license.users.attached
,license.users.detached
,license.owner.updated
,machine.owner.updated
event types.license.users.attach
,license.users.detach
,license.owner.update
,machine.owner.update
permissions.users
table).users
andlicenses
tables.Enable query logging for denormalization migration.Post-deploy
VACUUM ANALYZE licenses
.VACUUM ANALYZE users
.