Open JimFuller-RedHat opened 1 month ago
Where is that query coming from? I doubt that number is actually required.
yes, currently trying to figure out why during ingestion a count - teasing out at scale testing versus a single ingestion ...
This query is significantly faster (10x) if we drop the WHERE condition:
SELECT COUNT(*) AS num_items FROM (SELECT "qualified_purl"."id", "qualified_purl"."versioned_purl_id", "qualified_purl"."qualifiers" FROM "qualified_purl" LEFT JOIN "versioned_purl" ON "qualified_purl"."ver
sioned_purl_id" = "versioned_purl"."id" ) AS "sub_query"
can we not make the assumption that everything in versioned_purl has an entry in base_purl ?
probably invoked during use of REST API - found the source of this ... it is a classic postgres count issue buried in sea_orm paginator
sea-orm-1.0.0/src/executor/paginator.rs#68
pub async fn num_items(&self) -> Result<u64, DbErr> {
let builder = self.db.get_database_backend();
let stmt = SelectStatement::new()
.expr(Expr::cust("COUNT(*) AS num_items"))
.from_subquery(
self.query
.clone()
.reset_limit()
.reset_offset()
.clear_order_by()
.to_owned(),
Alias::new("sub_query"),
)
.to_owned();
We could rewrite this sql differently and/or add some indexes but it will not result in a magnitude change in performance eg. I tested and got about 20% better (not worth the trade off ... more indexes, complexity) - historically the way to fix counts is to use estimates (at scale this often becomes the only way).
FWIW A good (though old) article on pg counts = https://www.citusdata.com/blog/2016/10/12/count-performance/
conversations with @ctron - we should optimise calls to any service that does not need counts, so keeping this issue open for now.
During some trustify process (cve ingestion, read of REST API) we are hitting a query a few times (1155 times) that is unusually slow.
it is deceptively simple with a rather more complicated query plan then expected.
here is the query plan
The parallel seqscan looks suspect though we probably want to nudge the query planner to not aggregate ... investigating