lanterndata / lantern

PostgreSQL vector database extension for building AI applications
https://lantern.dev
Other
756 stars 53 forks source link

how does filtering work in lantern #321

Open lukebuehler opened 3 months ago

lukebuehler commented 3 months ago

I'm evaluating lantern and would love to see a small example and documentation on how filtering works in lantern.

Could be simply something like this: https://github.com/pgvector/pgvector/?tab=readme-ov-file#filtering

greptile-apps[bot] commented 3 months ago

Filtering in Lantern can be done using standard SQL WHERE clauses. Here is a small example:

CREATE TABLE small_world (id integer, vector real[3]);
INSERT INTO small_world (id, vector) VALUES (0, '{0,0,0}'), (1, '{0,0,1}');

-- Create an index on the vector column
CREATE INDEX ON small_world USING lantern_hnsw (vector);

-- Filter with a WHERE clause
SELECT id, vector FROM small_world WHERE id = 1;

For more details, refer to the README.md file.

References

/sql /src/hooks /test/sql /README.md

Ask Greptile

dqii commented 3 months ago

Hi @lukebuehler ! The example above does do filtering on id = 1. There's another example here in the docs: https://lantern.dev/docs/develop/query which does filtering on published_at < 2010 Does that help? Let me know if you were looking for something else!

Ngalstyan4 commented 3 months ago

Hi @lukebuehler,

The filtering example from the pgvector docs you linked above will work with no changes for lantern indexes as well! That is - the index will be consulted for nearest vectors and the vectors will be filtered after the fact with the given SQL predicate.

If the filter filters out a large portion of the rows, vector searches with pgvector will often return no results. Unlike pgvector, lantern will continue searching and returning results from the index until the given LIMIT is reached. Currently, this is done by recursively searching for more and more elements, but I just opened a PR (#322 ) that makes this filtered search more efficient by loading elements from the vector index in a streaming fashion.

Let me know if you have any other questions!

lukebuehler commented 3 months ago

This is helpful thanks! Good to know that you are using a recursive post-filter in lantern. I missed the query example for filtering.

In pgvector, if you have an index on the where column, the data is pre-filtered for exact search, and you can also create multiple partial indexes on a column which you can then select with a where clause. However, that only works with =, not with other comparisons. So It's nice that you do recursive search for non-equality predicates!

As for partial indexes, does lantern support it? It would be helpful for multi-tenanting.

Just my 2c: I often look for filtering documentation for various vector dbs, and they are often not very clear. I think a doc page that explains the basics conditions when it pre-filters, post-filters, uses a partial index (or whatever your feature are), etc is really helpful. LanceDB has a decent page.

Ngalstyan4 commented 3 months ago

As for partial indexes, does lantern support it? It would be helpful for multi-tenanting.

Partial indexes are supported, exactly as in btree or pgvector indexes! Here is an example from our tests

Thanks for the feedback on the docs! Will improve them before closing this issue. Let me know if you have any other questions!