jdagdelen / hyperDB

A hyper-fast local vector database for use with LLM Agents. Now accepting SAFEs at $135M cap.
MIT License
1.36k stars 83 forks source link

hyperDB vs. FAISS? #9

Open janzheng opened 1 year ago

janzheng commented 1 year ago

In all seriousness, how does this compare to FAISS?

yurymalkov commented 1 year ago

Haha. The difference depends on the algorithm, dataset and hardware. Ann benchmark had this fabulous numpy implementation (https://github.com/erikbern/ann-benchmarks/blob/main/ann_benchmarks/algorithms/bruteforce.py)

janzheng commented 1 year ago

Wow that's an incredible resource. Thanks for sharing!

I'm switching to HyperDB for the giggles. You can't benchmark hype.

jdagdelen commented 1 year ago

In all seriousness, I would totally support someone doing a real benchmark on this. HyperDB is actually a practical library for the use case that many devs have (adding knowledge from <100k paragraphs to an LLM.)

karencfisher commented 1 year ago

OK, I came across this looking for the possibility of a light weight, local, server less vector DB. I caught on something was amiss though by the phony link to a supposed paper by Andrej Karpathy (besides, the code there looked like a basic brute force search of the entire vector space -- "this," I though "won't scale"). For a minute (at least) I thought this was all cooked up via ChatGPT,

But damnit, something can be made of this. Why not make it the Dodge coin or Shiba Inu (both took the crypto space by storm despite being initially jokes) of vector DBs and turn it into something useful? Maybe the serverless vector DB equivalent as SQLite is to the RDBMS world? For a small vector document store and query solution that can run offline (thus preserving data privacy) there can be a place.

BTW, if you can read the code for Derridaean distance, it is not Derridaean. A hallmark of Derrida (like other French Post-modernists) is that hardly anyone can understand his stuff. ;)

jdagdelen commented 1 year ago

Yeah, this repo is a joke, but it is also totally functional. As for the derridean stuff, we’re adding in quantum algorithms so that should be handled.

aismlv commented 1 year ago

Haha stumbled upon this while working on a project inspired by similar thinking - I did run a quick benchmark if you are interested, together with some thoughts on when this approach might be a more suitable one

(And to preempt any questions, we are a stealth startup building in public and not accepting any funding at the moment)