timescale / pgvectorscale

A complement to pgvector for high performance, cost efficient vector search on large workloads.
PostgreSQL License
622 stars 24 forks source link

Add product quantization #9

Closed sgichohi closed 10 months ago

sgichohi commented 11 months ago

We introduce product quantization. Product quantization is a lossy compression technique to reduce the size of high cardinality vectors. In the context of our algorithm, we use it to reduce the dimensions of vectors we are using for distance calculations.

Tuning product quantization is a balance between seeing faster queries and later smaller indexes as a tradeoff on accuracy.

This is an opt in feature and one can invoke it via:

timescaledb_vector=# create extension timescaledb_vector cascade;
CREATE EXTENSION
timescaledb_vector=# \d+ test
                                               Table "public.test"
  Column   |     Type     | Collation | Nullable | Default | Storage  | Compression | Stats target | Description 
-----------+--------------+-----------+----------+---------+----------+-------------+--------------+-------------
 embedding | vector(1536) |           |          |         | extended |             |              | 
Access method: heap

 CREATE INDEX idx_tsv ON test USING tsv (embedding) WITH (num_neighbors =64,  search_list_size=125, max_alpha=1.0, use_pq=true, pq_vector_length=64);
NOTICE:  Starting index build. num_neighbors=64 search_list_size=125, max_alpha=1, use_pq=true, pq_vector_length=64
INFO:  Processed 1000 tuples in 50.557447667s which is 0.050557448s/tuple. Dist/tuple: Prune: 1281 search: 492. Stats: InsertStats { prune_neighbor_stats: PruneNeighborStats { calls: 4392, distance_comparisons: 1281856, node_reads: 1355834 }, greedy_search_stats: GreedySearchStats { calls: 998, distance_comparisons: 492479, node_reads: 492479, pq_distance_comparisons: 0 } }
NOTICE:  Training Product Quantization with 1000 vectors
INFO:  Writing took 21.913690083s or 0.021913690083s/tuple.  Avg neighbors: 24
WARNING:  Indexed 1000 tuples
CREATE INDEX

Note: Increasing pq_vector_length increases the accuracy of the results while reducing the speed of the queries.