microsoft / semantic-kernel

Integrate cutting-edge LLM technology quickly and easily into your apps
https://aka.ms/semantic-kernel
MIT License
20.57k stars 2.98k forks source link

.Net: [Microsoft.SemanticKernel.Connectors.Memory.Postgres.PostgresDbClient] - ORDER BY DESC prevents usage of ivfflat index #4131

Open Cotspheer opened 7 months ago

Cotspheer commented 7 months ago

Describe the bug

The current command prevents the usage of an index optimized for vector search in postgres. As someone mentioned the Issue is that postgres only supports ASC order index scans. I'm not sure if this is true as I can't find any documentation and I'm everything but a postgres expert. But changing the command to the second one provided made postgres to use the created index. The performance difference is huge from 30s down to 41ms for a database with 134'000 records and 1.2G in Size (with index 2.4G).

Important: Running both queries without the index do provide the same results thats why I'm confident that my adjustment is correct. The difference after creating the index is the result of the trade off between recall and performance which is intended to happen.

Any chance that an expert can look into that? Maybe the conclusion is that we then have to use our own custom postgres client implementation but if my adjustments are correct (similarity only should be inverted) it would be a huge improvement if it can be provided out of the box.

Source:

cmd.CommandText = @$"
                SELECT * FROM (SELECT {queryColumns}, 1 - (embedding <=> @embedding) AS cosine_similarity FROM {this.GetFullTableName(tableName)}
                ) AS sk_memory_cosine_similarity_table
                WHERE cosine_similarity >= @min_relevance_score
                ORDER BY cosine_similarity DESC
                Limit @limit";

Get the query plan used by postgres for the source query:

    EXPLAIN (ANALYZE, VERBOSE, BUFFERS)   SELECT * FROM 
                (SELECT key, metadata, timestamp,  1 - (embedding <=> '[]') AS cosine_similarity 
                FROM public."bger-gpt-3-5-turbo-v1"
                ) AS sk_memory_cosine_similarity_table
                WHERE cosine_similarity >= 0.7
                ORDER BY cosine_similarity DESC
               limit 5;

Query plan for the source query (uses Seq. Scan):

Limit  (cost=7013.81..7014.39 rows=5 width=249) (actual time=29643.922..29669.955 rows=5 loops=1)
  Output: "bger-gpt-3-5-turbo-v1".key, "bger-gpt-3-5-turbo-v1".metadata, "bger-gpt-3-5-turbo-v1"."timestamp", (('1'::double precision - ("bger-gpt-3-5-turbo-v1".embedding <=> '[]'::vector)))
  Buffers: shared hit=1054213 read=179346
  ->  Gather Merge  (cost=7013.81..11330.07 rows=36994 width=249) (actual time=29643.920..29669.950 rows=5 loops=1)
        Output: "bger-gpt-3-5-turbo-v1".key, "bger-gpt-3-5-turbo-v1".metadata, "bger-gpt-3-5-turbo-v1"."timestamp", (('1'::double precision - ("bger-gpt-3-5-turbo-v1".embedding <=> '[]'::vector)))
        Workers Planned: 2
        Workers Launched: 2
        Buffers: shared hit=1054213 read=179346
        ->  Sort  (cost=6013.78..6060.02 rows=18497 width=249) (actual time=29612.273..29612.275 rows=4 loops=3)
              Output: "bger-gpt-3-5-turbo-v1".key, "bger-gpt-3-5-turbo-v1".metadata, "bger-gpt-3-5-turbo-v1"."timestamp", (('1'::double precision - ("bger-gpt-3-5-turbo-v1".embedding <=> '[]'::vector)))
              Sort Key: (('1'::double precision - ("bger-gpt-3-5-turbo-v1".embedding <=> '[]'::vector))) DESC
              Sort Method: top-N heapsort  Memory: 28kB
              Buffers: shared hit=1054213 read=179346
              Worker 0:  actual time=29599.614..29599.617 rows=5 loops=1
                Sort Method: top-N heapsort  Memory: 25kB
                Buffers: shared hit=343340 read=58498
              Worker 1:  actual time=29602.116..29602.118 rows=5 loops=1
                Sort Method: top-N heapsort  Memory: 29kB
                Buffers: shared hit=344656 read=58594
              ->  Parallel Seq Scan on public."bger-gpt-3-5-turbo-v1"  (cost=0.00..5706.55 rows=18497 width=249) (actual time=8.585..29585.711 rows=43412 loops=3)
                    Output: "bger-gpt-3-5-turbo-v1".key, "bger-gpt-3-5-turbo-v1".metadata, "bger-gpt-3-5-turbo-v1"."timestamp", ('1'::double precision - ("bger-gpt-3-5-turbo-v1".embedding <=> '[]'::vector))
                    Filter: (('1'::double precision - ("bger-gpt-3-5-turbo-v1".embedding <=> '[]'::vector)) >= '0.7'::double precision)
                    Rows Removed by Filter: 979
                    Buffers: shared hit=1054102 read=179345
                    Worker 0:  actual time=2.433..29569.297 rows=42423 loops=1
                      Buffers: shared hit=343284 read=58498
                    Worker 1:  actual time=2.923..29576.258 rows=42602 loops=1
                      Buffers: shared hit=344601 read=58593
Query Identifier: -3047525361828273587
Planning Time: 0.125 ms
Execution Time: 29670.000 ms

Adjustment:

    cmd.CommandText = @$"
        SELECT 
                {queryColumns}, 
                1 - (embedding <=> @embedding) AS cosine_similarity 
        FROM {this.GetFullTableName(tableName)}
        WHERE (embedding <=> @embedding) <= (1 - @min_relevance_score)
        ORDER BY (embedding <=> @embedding) ASC
        Limit @limit";

Adjustment Query-Plan in Postgres using index:


    EXPLAIN (ANALYZE, VERBOSE, BUFFERS)    SELECT 
                metadata,
                1 - (embedding <=> '[]') 
                FROM public."bger-gpt-3-5-turbo-v1"
               WHERE embedding <=> '[]' <= (1 - 0.7)
                order by embedding <=> '[]'
                limit 5;

Query plan for the adjustmend (using the index):

Limit  (cost=1052.20..1052.35 rows=5 width=202) (actual time=2.147..2.248 rows=5 loops=1)
  Output: metadata, (('1'::double precision - (embedding <=> '[]'::vector))
  Buffers: shared hit=311 read=313
  ->  Index Scan using "bger-gpt-3-5-turbo-v1_embedding_idx" on public."bger-gpt-3-5-turbo-v1"  (cost=1052.20..2387.85 rows=44392 width=202) (actual time=2.145..2.245 rows=5 loops=1)
        Output: metadata, ('1'::double precision - (embedding <=> '[]'::vector)), (embedding <=> '[]'::vector)
        Order By: ("bger-gpt-3-5-turbo-v1".embedding <=> '[]'::vector)
        Filter: (("bger-gpt-3-5-turbo-v1".embedding <=> '[]'::vector) <= '0.3'::double precision)
        Buffers: shared hit=311 read=313
Query Identifier: 8283906244086523602
Planning:
  Buffers: shared hit=1
Planning Time: 0.136 ms
Execution Time: 2.289 ms

Index used:

CREATE INDEX ON public."bger-gpt-3-5-turbo-v1" USING ivfflat ("embedding" vector_cosine_ops)
WITH (lists = 134);

To Reproduce Steps to reproduce the behavior:

  1. Create a sample database with at least 1000 - 100'000 dummy records (ivfflat only works when dataset is large enough)
  2. Do a memory recall => Query embedding
  3. Wait 30s
  4. Create a ivfflat index like above for the corresponding vector operation (similarity in this case)
  5. Do a memory recall => Query embedding
  6. Wait again 30s
  7. Check plans using EXPLAIN => Recognize that index is not used
  8. Adjust the query as provided above (second one)
  9. Do a memory recall =>Query embedding
  10. Wait 40ms
  11. Check plans using EXPLAIN => See that the index now is used
  12. Drop the index
  13. Do a memory recall with the adjusted query as provided above => Query embedding
  14. Wait 30s
  15. Check plans using EXPLAIN => See that the Seq. Scan. is the issue as after dropping the index the adjusted query takes now the same amount of time as the one in the source

Expected behavior Either a hint in the documentation that the query has to be tweeked to the needs or a default implementation that can make use of ivfflat index.

Platform

Additional context I've have removed the vector as it would just clutter the whole description. If access is required hit me up in a PM and I try my best to provide access to the database in question.

github-actions[bot] commented 1 month ago

This issue is stale because it has been open for 90 days with no activity.

Cotspheer commented 1 month ago

It's still an issue. Currently I overcome this with a custom implementation.