timescale / pgvectorscale

A complement to pgvector for high performance, cost efficient vector search on large workloads.
PostgreSQL License
1.38k stars 58 forks source link

Question about index parameters #121

Open xcu opened 3 months ago

xcu commented 3 months ago

I am managing a vector DB and I'm considering switching to pgvectorscale. However, I'm a bit lost regarding what index configuration params I could use. The table in question contains +50M embeddings of 512 dimensions, but the table is partitioned with partman in tables of 100k embeddings. So we could actually regard it as 500 small tables of 100k embeddings, with 512 dimensions each.

Would default configuration/query params for the diskANN index suit? Or do you think there are some build/query parameters that could be tweaked for better recall/search speed?

jonatas commented 3 months ago

Hey @xcu, thanks for asking! @cevian can probably help to answer this, but I also see this question as a great conversation for our discord! Join us and check what other devs are using too: https://discord.gg/KRdHVXAmkp

cevian commented 3 months ago

@xcu I think the defaults should suffice here