[Bug]: Query Planner Prefers Parallel Sequential Scan Over Index Scan When enable_seqscan is `on`.

What happened?

I encountered a performance issue while running tests on PostgreSQL 16.4 with the pgvectorscale extension and DiskANN configuration. Below are the details of my setup: Database Configuration:

Machine specs: 8 CPUs, 32GB RAM
PostgreSQL settings:
checkpoint_timeout=5min, effective_cache_size=4GB, jit=on,
maintenance_work_mem=8GB, max_parallel_maintenance_workers=3,
max_parallel_workers=3, max_parallel_workers_per_gather=2,
max_wal_size=1GB, max_worker_processes=16, shared_buffers=8GB,
wal_compression=off, work_mem=4MB

DiskANN Configuration:

"storage_layout": "plain", "num_neighbors": 32, "search_list_size": 64,
"max_alpha": 1.2, "query_search_list_size": 32

When running a test with enable_seqscan set to on, I observed significantly lower query performance compared to when it was turned off. Observed Results: With enable_seqscan=on: QPS: 2.5511 Execution plan shows the query planner favoring a Parallel Sequential Scan despite an index being available. With enable_seqscan=off: QPS: 3130.1343 Execution plan utilizes the Index Scan, resulting in dramatically improved performance.

Execution Plans: With enable_seqscan=on: The planner chooses a Parallel Sequential Scan, with execution taking ~713ms. (Relevant details from the EXPLAIN ANALYZE output)

Limit  (cost=11783.19..11784.36 rows=10 width=16) (actual time=695.078..713.197 rows=10 loops=1)
   ->  Gather Merge  (cost=11783.19..60397.62 rows=416666 width=16) (actual time=695.076..713.193 rows=10 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Sort  (cost=10783.17..11304.00 rows=208333 width=16) (actual time=691.886..691.887 rows=7 loops=3)
               Sort Key: ((embedding <=> '[0.0041067647,-0.0126647325,-0.0024094123,-0.011227365,-0.016559536,0.014572391,0.009068001,-0.020348357,-0.01203547,0.007226581,-0.009213725,0.0072994432,-0.017698832,-0.014108724,-0.021302186,0.009167358,0.024362387,.......
-0.015857412,0.0011955984,-0.020507328,0.016374068,-0.011995727,-0.024123931,0.00021185855,0.01373779,-0.00064789184,-0.004537313,0.0036099786,0.00507053,-0.018864622,0.006733107,0.045200907,-0.020017166,-0.021686368,-0.0018463882,-0.010313278,0.0035205572,-0.004802265,-0.009438935]'::vector))
               Sort Method: top-N heapsort  Memory: 25kB
               Worker 0:  Sort Method: top-N heapsort  Memory: 25kB
               Worker 1:  Sort Method: top-N heapsort  Memory: 25kB
               ->  Parallel Seq Scan on pg_vectorscale_collection  (cost=0.00..6281.17 rows=208333 width=16) (actual time=0.082..673.951 rows=166667 loops=3)
 Planning Time: 0.068 ms
 Execution Time: 713.217 ms
(12 rows)

With enable_seqscan=off:

The planner opts for an Index Scan, with execution completing in ~12ms.

(Relevant details from the EXPLAIN ANALYZE output)

Limit  (cost=20041.50..20041.92 rows=10 width=16) (actual time=11.334..11.960 rows=10 loops=1)
->  Index Scan using pgvectorscale_index on pg_vectorscale_collection  (cost=20041.50..40999.50 rows=500000 width=16) (actual time=11.333..11.957 rows=10 loops=1)
     Order By: (embedding <=> '[0.0041067647,-0.0126647325,-0.0024094123,-0.011227365,-0.016559536,0.014572391,0.009068001,-0.020348357,-0.01203547,0.007226581,-0.009213725,0.0072994432,
-0.017698832,-0.014108724,-0.021302186,0.009167358,0.024362387,-0.026879437,..........
-0.020017166,-0.021686368,-0.0018463882,-0.010313278,0.0035205572,-0.004802265,-0.009438935]'::vector)
Planning Time: 0.064 ms
Execution Time: 12.080 ms

This behavior suggests that enabling enable_seqscan causes the query planner to prioritize Parallel Sequential Scans over Index Scans, even when the latter performs significantly better. The preference for Parallel Sequential Scan when enable_seqscan is enabled seems suboptimal for this use case. Could this be a bug, or is there a configuration adjustment that I’m missing? Any guidance or insights into this behavior would be greatly appreciated.

pgvectorscale extension affected

0.4.0

PostgreSQL version used

16.4

What operating system did you use?

22.04

What installation method did you use?

Source

What platform did you run on?

Microsoft Azure Cloud

Relevant log output and stack trace

Limit  (cost=11783.19..11784.36 rows=10 width=16) (actual time=695.078..713.197 rows=10 loops=1)
   ->  Gather Merge  (cost=11783.19..60397.62 rows=416666 width=16) (actual time=695.076..713.193 rows=10 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Sort  (cost=10783.17..11304.00 rows=208333 width=16) (actual time=691.886..691.887 rows=7 loops=3)
               Sort Key: ((embedding <=> '[0.0041067647,-0.0126647325,-0.0024094123,-0.011227365,-0.016559536,0.014572391,0.009068001,-0.020348357,-0.01203547,0.007226581,-0.009213725,0.0072994432,-0.017698832,-0.014108724,-0.021302186,0.009167358,0.024362387,.......
-0.015857412,0.0011955984,-0.020507328,0.016374068,-0.011995727,-0.024123931,0.00021185855,0.01373779,-0.00064789184,-0.004537313,0.0036099786,0.00507053,-0.018864622,0.006733107,0.045200907,-0.020017166,-0.021686368,-0.0018463882,-0.010313278,0.0035205572,-0.004802265,-0.009438935]'::vector))
               Sort Method: top-N heapsort  Memory: 25kB
               Worker 0:  Sort Method: top-N heapsort  Memory: 25kB
               Worker 1:  Sort Method: top-N heapsort  Memory: 25kB
               ->  Parallel Seq Scan on pg_vectorscale_collection  (cost=0.00..6281.17 rows=208333 width=16) (actual time=0.082..673.951 rows=166667 loops=3)
 Planning Time: 0.068 ms
 Execution Time: 713.217 ms
(12 rows)

How can we reproduce the bug?

To reproduce use the following configuration:

Machine specs: 8 CPUs, 32GB RAM
Postgres version: 16.4
PostgreSQL settings:
checkpoint_timeout=5min, effective_cache_size=4GB, jit=on,
maintenance_work_mem=8GB, max_parallel_maintenance_workers=3,
max_parallel_workers=3, max_parallel_workers_per_gather=2,
max_wal_size=1GB, max_worker_processes=16, shared_buffers=8GB,
wal_compression=off, work_mem=4MB
enable_seqscan=off

DiskAnn Configuration:

"storage_layout": "plain", 
"num_neighbors": 32, 
"search_list_size": 64,
"max_alpha": 1.2, 
"query_search_list_size": 32

Make sure to set enable_seqscan on



### Are you going to work on the bugfix?

🆘 No, could someone else please work on the bugfix?

timescale / pgvectorscale