paradedb / pg_analytics

DuckDB-powered analytics for Postgres
https://paradedb.com
PostgreSQL License
383 stars 15 forks source link

Query joining search index and foreign table is very slow #157

Closed pdpark closed 1 month ago

pdpark commented 3 months ago

What happens?

I'm getting wildly varying query times on the first query after creating my tables and search index. About 50% of the time it runs in 300-500ms. The other ~50% of the time it takes almost two minutes (1:45-1:50)! After the query runs slow once, the next query runs fast. In between runs I stop & remove the container, remove the images, and prune all volumes.

To Reproduce

Query:

with search_score as (
    select * from my_schema.search_idx.score_bm25(
    '
    (
             x:"green"
         OR  y:"green"
         OR  z:"green"
         OR  m:"green"
         OR  n:"green"
         OR  p:"green"
         OR  q:"green"
         OR  r:"green"
    )
    AND a:"456"
    '
    )
)
select
        x.cold,
        x.cole,
        x.colf,
        x.colg,
        x.colh,
        p.colx
from search_score as s
join my_schema.table1 as x
on s.id = x.id
join my_schema.table2 as p
on p.cola = x.cola
and p.colb = x.colb
and (
    p.colc = '123'
    or p.colc is null
)
order by score_bm25 desc
limit 250
;

OS:

Amazon Linux 2023 / PostgreSQL 16.4 (Debian 16.4-1.pgdg120+1) on aarch64-unknown-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit

ParadeDB Version:

pg_search 0.9.1 pg_analytics 0.1.0

Are you using ParadeDB Docker, Helm, or the extension(s) standalone?

ParadeDB Docker Image

Full Name:

Patrick Park

Affiliation:

Payzer

Did you include all relevant data sets for reproducing the issue?

No - I cannot share the data sets because they are confidential

Did you include the code required to reproduce the issue?

Did you include all relevant configurations (e.g., CPU architecture, PostgreSQL version, Linux distribution) to reproduce the issue?

philippemnoel commented 3 months ago

Potentially related to this issue: https://github.com/paradedb/pg_analytics/issues/65

philippemnoel commented 1 month ago

This is now unblocked and a great first issue

philippemnoel commented 1 month ago
Capture d’écran, le 2024-10-18 à 21 32 50

Reported as fixed by @pdpark