the following query (generated by SQLAlchemy from python code) is dead slow for projects with large amounts of documents (and spans), like 10k+ documents. For 50k+ it never finishes and just burns Postgres CPU.
SELECT array_remove(array_agg(distinct(sourcedocument.id)), NULL) AS sdoc_ids, EXTRACT(year FROM anon_2.date) AS anon_1
FROM sourcedocument JOIN (SELECT sourcedocument.id AS id, sourcedocumentmetadata_1.date_value AS date, array_remove(array_agg(distinct(code.id)), NULL) AS "TA_CODE_ID_LIST"
FROM sourcedocument JOIN sourcedocumentmetadata AS sourcedocumentmetadata_1 ON sourcedocument.id = sourcedocumentmetadata_1.source_document_id AND sourcedocumentmetadata_1.project_metadata_id = 2234 AND sourcedocumentmetadata_1.date_value IS NOT NULL LEFT OUTER JOIN annotationdocument ON sourcedocument.id = annotationdocument.source_document_id, spanannotation LEFT OUTER JOIN code ON code.id = spanannotation.code_id
WHERE sourcedocument.project_id = 87 GROUP BY sourcedocument.id, sourcedocumentmetadata_1.date_value) AS anon_2 ON sourcedocument.id = anon_2.id
WHERE anon_2."TA_CODE_ID_LIST" @> ARRAY[9621]::INTEGER[] GROUP BY EXTRACT(year FROM anon_2.date) ORDER BY sdoc_ids DESC
the following query (generated by SQLAlchemy from python code) is dead slow for projects with large amounts of documents (and spans), like 10k+ documents. For 50k+ it never finishes and just burns Postgres CPU.