supabase / vecs

Postgres/pgvector Python Client
https://supabase.github.io/vecs/latest
Apache License 2.0
219 stars 33 forks source link

Do i need to store all filters in the "metadata" column? #21

Closed Bewinxed closed 1 year ago

Bewinxed commented 1 year ago

Summary

I am trying to create a personalized chatbot, currently according to the vecs docs, i should be storing things like "user_id" in the metadata JSON, however i'm not sure how scalable that is.

Unresolved Questions

Can I for example, add a column to the table called "user_id" and "bot_id", and modify the query to select the vectors where user_id = X and bot_id = Y?

Wouldn't this result in much faster queries?

What would be the best way to approach this?

Thank you!

olirice commented 1 year ago

vecs automatically indexes your metadata when you run Collection.create_index(...). Internally that metadata is stored as a jsonb column.

If you query based on equality e.g.

docs.query(
    ...
    filters={"user_id": {"$eq": 99}},
)

then the filter will be able to use the index and the performance difference will be negligible when compared to a dedicated column.

In the future we may offer the ability to create dedicated columns to allow cleaner interop with other parts of postgres e.g. e.g. referential integrity with foreign keys, but performance is not likely to be a motivator in real-world workloads