TL;DRpin_content_cid_status_idx exists in both Prod and Staging but for some reason, it is not used in prod.
Update Content DAG Sizes performance greatly differs between prod and staging, despite the tables' sizes being comparable and the same indexes existing.
By digging into it, I noticed the following.
Explaining this query (both prod and staging):
SELECT *
FROM (
SELECT COALESCE((
SELECT SUM(dag_size)
FROM (
SELECT c.cid,
c.dag_size
FROM upload u
JOIN content c ON c.cid = u.content_cid
JOIN pin p ON p.content_cid = u.content_cid
WHERE u.user_id = 315318734258456601
AND u.deleted_at is null
AND p.status = 'Pinned'
GROUP BY c.cid,
c.dag_size
) AS uploaded_content), 0)
) as up,
(
SELECT COALESCE((
SELECT SUM(dag_size)
FROM (
SELECT psa_pr.content_cid,
c.dag_size
FROM psa_pin_request psa_pr
JOIN content c ON c.cid = psa_pr.content_cid
JOIN pin p ON p.content_cid = psa_pr.content_cid
JOIN auth_key a ON a.id = psa_pr.auth_key_id
WHERE a.user_id = 315318734258456601
AND psa_pr.deleted_at is null
AND p.status = 'Pinned'
GROUP BY psa_pr.content_cid,
c.dag_size
) AS pinned_content), 0)
)as psa;
(which should be equivalent to the remote procedure FUNCTION user_used_storage(query_user_id BIGINT) used to calculate storage limits) it is executed differently in the 2 envs.
Here you can find attached the 2 execution plans, and as you can see in prod is doing a Seq Scan of pins instead of using the index pin_content_cid_status_idx
That seq scan could explain the huge difference in performance between the 2 env.
But I can’t find an obvious explanation why the 2 plans are different.
The only difference I can think of is that pin_content_cid_status_idx was created CONCURRENTLY in prod, see here…. I know it can fail sometimes, (see this article) but by running
SELECT * FROM pg_class, pg_index WHERE pg_index.indisvalid = false AND pg_index.indexrelid = pg_class.oid;
the index doesn’t show up suggesting it’s valid and ready?
Context
TL;DR
pin_content_cid_status_idx
exists in both Prod and Staging but for some reason, it is not used in prod.Update Content DAG Sizes
performance greatly differs between prod and staging, despite the tables' sizes being comparable and the same indexes existing.By digging into it, I noticed the following.
Explaining this query (both prod and staging):
(which should be equivalent to the remote procedure FUNCTION
user_used_storage(query_user_id BIGINT)
used to calculate storage limits) it is executed differently in the 2 envs.Here you can find attached the 2 execution plans, and as you can see in prod is doing a
Seq Scan
of pins instead of using the indexpin_content_cid_status_idx
That seq scan could explain the huge difference in performance between the 2 env.But I can’t find an obvious explanation why the 2 plans are different. The only difference I can think of is that
pin_content_cid_status_idx
was createdCONCURRENTLY
in prod, see here…. I know it can fail sometimes, (see this article) but by runningthe index doesn’t show up suggesting it’s valid and ready?