Enhancement: Evaluate different search implementations for RAG - Githubissues

rmusser01 / tldw

tl/dw (Too Long, Didn't Watch): Your Personal Research Multi-Tool - a naive attempt at 'A Young Lady's Illustrated Primer'

https://tldwproject.com

Apache License 2.0

398 stars 13 forks source link

Enhancement: Evaluate different search implementations for RAG #180

Open rmusser01 opened 3 months ago

rmusser01 commented 3 months ago

Look at different methods of search.

Implementations of: https://github.com/unum-cloud/usearch

BM25

https://en.wikipedia.org/wiki/Okapi_BM25

Page Rank

https://en.wikipedia.org/wiki/PageRank

tf-idf

https://en.wikipedia.org/wiki/Tf%E2%80%93idf

rmusser01 commented 2 months ago

https://github.com/meilisearch/meilisearch

BEAM search https://github.com/infiniflow/infinity https://docs.haystack.deepset.ai/docs/inmemorybm25retriever https://www.width.ai/post/what-is-beam-search

Search https://jykoh.com/search-agents/paper.pdf https://jykoh.com/search-agents https://simonwillison.net/2024/Jun/21/search-based-rag/ https://arxiv.org/html/2404.07220v1 https://jdsemrau.substack.com/p/semantic-search-over-200k-posts https://www.youtube.com/watch?v=kOALKZvhMgQ https://arxiv.org/abs/2212.10496 https://medium.com/@kbdhunga/advanced-rag-multi-query-retriever-approach-ad8cd0ea0f5b https://github.com/Rman410/hybrid-search/blob/main/hybrid-search.py https://www.linkedin.com/pulse/googles-new-algorithms-just-made-searching-vector-faster-bamania-cyx3e/ https://huggingface.co/papers/2407.03618 https://about.xethub.com/blog/you-dont-need-a-vector-database https://infiniflow.org/blog/best-hybrid-search-solution https://research.google/blog/soar-new-algorithms-for-even-faster-vector-search-with-scann/ https://softwaredoug.com/blog/2024/06/25/what-ai-engineers-need-to-know-search https://div.beehiiv.com/p/advanced-rag-series-retrieval https://techcommunity.microsoft.com/t5/microsoft-developer-community/doing-rag-vector-search-is-not-enough/ba-p/4161073 https://arxiv.org/abs/2104.05740 https://github.com/xhluca/bm25s

https://github.com/facebookresearch/fastText https://www.mixedbread.ai/blog/intro-bmx https://towardsdatascience.com/building-a-sentence-embedding-index-with-fasttext-and-bm25-f07e7148d240?gi=43cce89eac18

Search 101 https://softwaredoug.com/blog/2024/06/25/what-ai-engineers-need-to-know-search https://arxiv.org/abs/2104.05740 https://www.linkedin.com/pulse/googles-new-algorithms-just-made-searching-vector-faster-bamania-cyx3e/ https://techcommunity.microsoft.com/t5/ai-azure-ai-services-blog/azure-ai-search-outperforming-vector-search-with-hybrid/ba-p/3929167 Sample Builds https://simonwillison.net/2024/Jun/21/search-based-rag/ Beam Search https://www.width.ai/post/what-is-beam-search BM25 https://docs.haystack.deepset.ai/docs/inmemorybm25retriever https://jdsemrau.substack.com/p/semantic-search-over-200k-posts https://huggingface.co/papers/2407.03618 https://github.com/xhluca/bm25s DBs https://github.com/infiniflow/infinity RAG + BM25 = better than either alone - https://about.xethub.com/blog/you-dont-need-a-vector-database Tree Search https://jykoh.com/search-agents/paper.pdf https://jykoh.com/search-agents https://arxiv.org/abs/2407.00320 Blended RAG / Hybrid Search https://arxiv.org/html/2404.07220v1 https://www.youtube.com/watch?v=kOALKZvhMgQ https://github.com/Rman410/hybrid-search/blob/main/hybrid-search.py https://github.com/pgvector/pgvector-python/blob/master/examples/hybrid_search.py https://infiniflow.org/blog/best-hybrid-search-solution SPLADE https://www.pinecone.io/learn/splade/ SOAR https://research.google/blog/soar-new-algorithms-for-even-faster-vector-search-with-scann/

rmusser01 commented 2 months ago

https://ai.gopubby.com/search-in-the-age-of-ai-retrieval-methods-for-beginners-557621e12ded

rmusser01 commented 1 month ago

https://archive.is/ROLjD https://archive.is/teZE1

rmusser01 commented 2 weeks ago

https://arxiv.org/abs/2411.03253

rmusser01 commented 2 weeks ago

https://eprint.iacr.org/2024/1774

rmusser01 commented 1 week ago

https://arxiv.org/abs/2410.20285

rmusser01 commented 3 days ago

https://softwaredoug.com/blog/2024/11/03/rrf-is-not-enough
https://github.com/typesense/typesense
https://typesense.org/docs/0.25.0/api/vector-search.html
https://github.com/xhluca/bm25s