pulibrary / dpul-collections

An inspiring environment for global communities to engage with diverse digital collections
1 stars 0 forks source link

Figgy resource query is too slow, and it's timing out #165

Open tpendragon opened 4 hours ago

tpendragon commented 4 hours ago

Summary or User Story

In testing indexing of the Figgy prod database, we were getting errors about postgresql cancelling connections. We determined this was a timeout of 15 seconds. We ran EXPLAIN ANALYZE in FIggy Prod with the query we were making before:

EXPLAIN ANALYZE select * from orm_resources WHERE internal_resource != 'Event' AND internal_resource != 'PreservationObject' AND (updated_at > '2023-10-19 12:47:02.207629Z' OR (updated_at = '2023-10-19 12:47:02.207629Z' AND id > 'f087099c-fa18-487c-b28c-f2bd8e265170')) ORDER BY updated_at ASC, id ASC LIMIT 10;

and found that it was doing a full table scan. In modifying that query we found that it was the updated_at= line that was causing the table scan. We'll need to adjust that query to be equivalent, but not look for an exact updated_at.

Acceptance Criteria

First step

Figure out an equivalent query and test it in the console.

tpendragon commented 4 hours ago

We found that

EXPLAIN ANALYZE select * from orm_resources WHERE internal_resource != 'Event' AND internal_resource != 'PreservationObject' AND (updated_at >= '2023-10-19 12:47:02.207629Z' AND (updated_at > '2023-10-19 12:47:02.207629Z' OR id > 'f087099c-fa18-487c-b28c-f2bd8e265170')) ORDER BY updated_at ASC, id ASC LIMIT 10;

works.