timescale / pgvectorscale

A complement to pgvector for high performance, cost efficient vector search on large workloads.
PostgreSQL License
1.11k stars 47 forks source link

Support for streaming queries over the index. #6

Closed cevian closed 1 year ago

cevian commented 1 year ago

The main commit in this PR converts the greedy-search algorithm to be "streaming". The original algorithm assumed the caller knew how many results the search needed to generate (through the search list size parameter that gave an upper bound on the number of results you could get from a search. However, the Postgres interface does not know how many results will be needed. The api has a "get next" function the index must provide and the index is responsible for as many results as calls to the function. Theoretically, for correctness this can go through the entire table and semantically the index can't just say "I've given you enough"

Thankfully, using the ListSearchResults as "search state" there is an adaptation that allows the search to restart where it left off and continue the search an arbitrary number of times. That's what this PR implements.