ory / hydra

The most scalable and customizable OpenID Certified™ OpenID Connect and OAuth Provider on the market. Become an OpenID Connect and OAuth2 Provider over night. Broad support for related RFCs. Written in Go, cloud native, headless, API-first. Available as a service on Ory Network and for self-hosters.
https://www.ory.sh/?utm_source=github&utm_medium=banner&utm_campaign=hydra
Apache License 2.0
15.66k stars 1.5k forks source link

Seq scan for janitor login/consent flows cleanup SQL query #3822

Open David-Wobrock opened 3 months ago

David-Wobrock commented 3 months ago

Preflight checklist

Ory Network Project

No response

Describe the bug

The Hydra janitor is running this query to cleanup inactive login/consent requests:

SELECT login_challenge
FROM hydra_oauth2_flow
WHERE (
    (state != 6)
        OR (login_error IS NOT NULL AND login_error <> '{}' AND login_error <> '')
        OR (consent_error IS NOT NULL AND consent_error <> '{}' AND consent_error <> '')
    )
  AND requested_at < '2024-08-25 08:00:00'
  AND nid = '326c59d4-e93c-47fc-96d4-40c5522f17c1'
ORDER BY login_challenge
    LIMIT 50000;

(with some example values). See https://github.com/ory/hydra/blob/v2.2.0/persistence/sql/persister_consent.go#L732-L792

This is running a sequential query on PostgreSQL:

 Limit  (cost=2552485.01..2552485.01 rows=1 width=33)
   ->  Sort  (cost=2552485.01..2552485.01 rows=1 width=33)
         Sort Key: login_challenge
         ->  Gather  (cost=1000.00..2552485.00 rows=1 width=33)
               Workers Planned: 2
               ->  Parallel Seq Scan on hydra_oauth2_flow  (cost=0.00..2551484.90 rows=1 width=33)
                     Filter: ((requested_at < '2024-08-25 08:00:00'::timestamp without time zone) AND (nid = '326c59d4-e93c-47fc-96d4-40c5522f17c1'::uuid) AND ((state <> 6) OR ((login_error IS NOT NULL) AND (login_error <> '{}'::text) AND (login_error <> ''::text)) OR ((consent_error IS NOT NULL) AND (consent_error <> '{}'::text) AND (consent_error <> ''::text))))
(7 rows)

On a ~18 million row database, this query can take between 15 and 40 seconds, depending on the load.

The expected behaviour to have an index for this query by default in Hydra.

Reproducing the bug

Run the janitor SQL above.

Relevant log output

No response

Relevant configuration

No response

Version

v2.2.0

On which operating system are you observing this issue?

None

In which environment are you deploying?

Kubernetes with Helm

Additional Context

We tried indexing manually requested_at or state, but since the query has many AND/ORs, PostgreSQL 15.5 is not able to pick up the index properly and keeps running the sequential scan.