Open hlinnaka opened 3 weeks ago
Unlike the current HNSW implementation, StreamingDiskANN has no recall degradation with post-filtering (actually that's the "Streaming" part of the algorithm. You can read more here: https://www.timescale.com/blog/how-we-made-postgresql-as-fast-as-pinecone-for-vector-data/ (Section " Support for streaming retrieval for accurate metadata filtering").
The same Streaming method is used with and without filters so there is no performance degradation per se. Although obviously for more selective queries more of the graph needs to be traversed.
Honestly, not sure how translatable the streaming approach is to hnsw, but am skeptical it's easy because of complications introduced by the multi-level stuff.
(edited the link in the original question to point to correct PR)
How does the post-filtering perform compared to https://github.com/pgvector/pgvector/pull/282 and https://github.com/pgvector/pgvector/pull/524? Recall? Speed?