Open ksrinivs64 opened 10 months ago
Hi, thank you for your interest in DotHash. DotHash simply requires storing a vector for each document or node. You could use something like a vector database to scale this to millions of vectors.
Yes, but I was looking at ANNs and most claim to do very poorly with high dimensional vectors. Have you tried any particular one? Would you recommend something if you have? Thanks again - very cool work.
I have not used vector databases, the experiments we did were small enough that everything fits in memory. Could you elaborate on the following:
I was looking at ANNs and most claim to do very poorly with high dimensional vectors
It is not clear to me what performs poorly, the ANN? Or the vector database?
As far as I know all vector databases scale by space partitioning algorithms and the ones I looked at like FAISS said they become really inaccurate with high dimensional vectors. Kavitha
On Mon, Dec 4, 2023, 3:01 PM Mike Heddes @.***> wrote:
I have not used vector databases, the experiments we did were small enough that everything fits in memory. Could you elaborate on the following:
I was looking at ANNs and most claim to do very poorly with high dimensional vectors
It is not clear to me what performs poorly, the ANN? Or the vector database?
— Reply to this email directly, view it on GitHub https://github.com/mikeheddes/dothash/issues/1#issuecomment-1839384359, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACNS6QX2U5ICFANJQH2UUH3YHYTZPAVCNFSM6AAAAABADJ5BESVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMZZGM4DIMZVHE . You are receiving this because you authored the thread.Message ID: @.***>
Hi thanks for a very nice paper and the code - does the dothash solution scale to millions of vectors (say millions of documents for vector search)? Or is it currently limited by whatever can be computed in memory? Thanks