timescale / pgvectorscale

A complement to pgvector for high performance, cost efficient vector search on large workloads.
PostgreSQL License
1.35k stars 57 forks source link

[Bug]: Query Planner Prefers Parallel Sequential Scan Over Index Scan When enable_seqscan is `on`. #166

Open Sheharyar570 opened 1 day ago

Sheharyar570 commented 1 day ago

What happened?

I encountered a performance issue while running tests on PostgreSQL 16.4 with the pgvectorscale extension and DiskANN configuration. Below are the details of my setup: Database Configuration:

Machine specs: 8 CPUs, 32GB RAM
PostgreSQL settings:
checkpoint_timeout=5min, effective_cache_size=4GB, jit=on,
maintenance_work_mem=8GB, max_parallel_maintenance_workers=3,
max_parallel_workers=3, max_parallel_workers_per_gather=2,
max_wal_size=1GB, max_worker_processes=16, shared_buffers=8GB,
wal_compression=off, work_mem=4MB

DiskANN Configuration:

"storage_layout": "plain", "num_neighbors": 32, "search_list_size": 64,
"max_alpha": 1.2, "query_search_list_size": 32

When running a test with enable_seqscan set to on, I observed significantly lower query performance compared to when it was turned off. Observed Results: With enable_seqscan=on: QPS: 2.5511 Execution plan shows the query planner favoring a Parallel Sequential Scan despite an index being available. With enable_seqscan=off: QPS: 3130.1343 Execution plan utilizes the Index Scan, resulting in dramatically improved performance.

Execution Plans: With enable_seqscan=on: The planner chooses a Parallel Sequential Scan, with execution taking ~713ms. (Relevant details from the EXPLAIN ANALYZE output)

Limit  (cost=11783.19..11784.36 rows=10 width=16) (actual time=695.078..713.197 rows=10 loops=1)
   ->  Gather Merge  (cost=11783.19..60397.62 rows=416666 width=16) (actual time=695.076..713.193 rows=10 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Sort  (cost=10783.17..11304.00 rows=208333 width=16) (actual time=691.886..691.887 rows=7 loops=3)
               Sort Key: ((embedding <=> '[0.0041067647,-0.0126647325,-0.0024094123,-0.011227365,-0.016559536,0.014572391,0.009068001,-0.020348357,-0.01203547,0.007226581,-0.009213725,0.0072994432,-0.017698832,-0.014108724,-0.021302186,0.009167358,0.024362387,.......
-0.015857412,0.0011955984,-0.020507328,0.016374068,-0.011995727,-0.024123931,0.00021185855,0.01373779,-0.00064789184,-0.004537313,0.0036099786,0.00507053,-0.018864622,0.006733107,0.045200907,-0.020017166,-0.021686368,-0.0018463882,-0.010313278,0.0035205572,-0.004802265,-0.009438935]'::vector))
               Sort Method: top-N heapsort  Memory: 25kB
               Worker 0:  Sort Method: top-N heapsort  Memory: 25kB
               Worker 1:  Sort Method: top-N heapsort  Memory: 25kB
               ->  Parallel Seq Scan on pg_vectorscale_collection  (cost=0.00..6281.17 rows=208333 width=16) (actual time=0.082..673.951 rows=166667 loops=3)
 Planning Time: 0.068 ms
 Execution Time: 713.217 ms
(12 rows)

With enable_seqscan=off:

This behavior suggests that enabling enable_seqscan causes the query planner to prioritize Parallel Sequential Scans over Index Scans, even when the latter performs significantly better. The preference for Parallel Sequential Scan when enable_seqscan is enabled seems suboptimal for this use case. Could this be a bug, or is there a configuration adjustment that I’m missing? Any guidance or insights into this behavior would be greatly appreciated.

pgvectorscale extension affected

0.4.0

PostgreSQL version used

16.4

What operating system did you use?

22.04

What installation method did you use?

Source

What platform did you run on?

Microsoft Azure Cloud

Relevant log output and stack trace

Limit  (cost=11783.19..11784.36 rows=10 width=16) (actual time=695.078..713.197 rows=10 loops=1)
   ->  Gather Merge  (cost=11783.19..60397.62 rows=416666 width=16) (actual time=695.076..713.193 rows=10 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Sort  (cost=10783.17..11304.00 rows=208333 width=16) (actual time=691.886..691.887 rows=7 loops=3)
               Sort Key: ((embedding <=> '[0.0041067647,-0.0126647325,-0.0024094123,-0.011227365,-0.016559536,0.014572391,0.009068001,-0.020348357,-0.01203547,0.007226581,-0.009213725,0.0072994432,-0.017698832,-0.014108724,-0.021302186,0.009167358,0.024362387,.......
-0.015857412,0.0011955984,-0.020507328,0.016374068,-0.011995727,-0.024123931,0.00021185855,0.01373779,-0.00064789184,-0.004537313,0.0036099786,0.00507053,-0.018864622,0.006733107,0.045200907,-0.020017166,-0.021686368,-0.0018463882,-0.010313278,0.0035205572,-0.004802265,-0.009438935]'::vector))
               Sort Method: top-N heapsort  Memory: 25kB
               Worker 0:  Sort Method: top-N heapsort  Memory: 25kB
               Worker 1:  Sort Method: top-N heapsort  Memory: 25kB
               ->  Parallel Seq Scan on pg_vectorscale_collection  (cost=0.00..6281.17 rows=208333 width=16) (actual time=0.082..673.951 rows=166667 loops=3)
 Planning Time: 0.068 ms
 Execution Time: 713.217 ms
(12 rows)

How can we reproduce the bug?

To reproduce use the following configuration:

Machine specs: 8 CPUs, 32GB RAM
Postgres version: 16.4
PostgreSQL settings:
checkpoint_timeout=5min, effective_cache_size=4GB, jit=on,
maintenance_work_mem=8GB, max_parallel_maintenance_workers=3,
max_parallel_workers=3, max_parallel_workers_per_gather=2,
max_wal_size=1GB, max_worker_processes=16, shared_buffers=8GB,
wal_compression=off, work_mem=4MB
enable_seqscan=off

DiskAnn Configuration:

"storage_layout": "plain", 
"num_neighbors": 32, 
"search_list_size": 64,
"max_alpha": 1.2, 
"query_search_list_size": 32

Make sure to set enable_seqscan on



### Are you going to work on the bugfix?

🆘 No, could someone else please work on the bugfix?
cevian commented 18 hours ago

@Sheharyar570 thanks for the report. Can you please provide the settings for all the *_cost function GUCs: https://www.postgresql.org/docs/current/runtime-config-query.html#RUNTIME-CONFIG-QUERY-CONSTANTS

Have you run ANALYZE on the table? Does that help?

Also running timescale-tune might help. Can you try that too?