Hybrid search and reranking

AndyMik90 commented 9 months ago

Hi, will there be support in the js version for Pinecone hybrid search and Cohere Reranker? We got this running on Python but would love to move to Vercel AI SDK with this functionality.

ali-habibzadeh commented 8 months ago

Would be great to see how to use pinecone hybrid search with the js framework

ladrians commented 8 months ago

+1 to this request!

AndyMik90 commented 7 months ago

Any updates?

karol-f commented 7 months ago

You can try to implement it like this https://cookbook.openai.com/examples/question_answering_using_a_search_api

dosubot[bot] commented 4 months ago

Hi, @AndyMik90,

I'm helping the langchainjs team manage their backlog and am marking this issue as stale. From what I understand, you opened this issue to request support for Pinecone hybrid search and Cohere Reranker in the JavaScript version of the repository. There has been interest and support from other users, with suggestions and examples provided for potential implementation.

Could you please confirm if this issue is still relevant to the latest version of the langchainjs repository? If it is, please let the langchainjs team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your understanding and contribution to the langchainjs project.

karol-f commented 3 months ago

Hi, Cohere reranking (for Parent Retriever) is in PR - https://github.com/langchain-ai/langchainjs/pull/4738

sugarforever commented 2 months ago

Hi, Cohere reranking (for Parent Retriever) is in PR - #4738

@karol-f Is there any localized solution for reranking with langchain.js? I'm working on a 100% local knowledge base, so looking for a production ready and 100% local solution for reranking, in order to improve the RAG quality. Thanks.

karol-f commented 2 months ago

@sugarforever Yes! There is my PR https://github.com/langchain-ai/langchainjs/pull/4738 for that. I'm already using this in my app using "Patch Package" NPM library

sugarforever commented 2 months ago

@sugarforever Yes! There is my PR #4738 for that. I'm already using this in my app using "Patch Package" NPM library

Cool. What's the implementation of BaseDocumentCompressor you use?

karol-f commented 2 months ago

Cohere (with v2 reranking model, didn't check v3) as it is cheap and fast and gives good results.

My Rag works quite good with such setup:

all chunks have contextual header (in my case breadcrumbs from crawled webpage or document name and group from GDrive) - up to 100 chars. Chunk is up to 200-250 chars. I cannot stress enough how much it helps with proper retrieval from vector store and further understanding by LLM
before chunking data is converted first to Markdown and splitted using Markdown Text Splitter so the are meaningful chunks (same is done for bigger parent chunks)
Multi query retriever to generate one more question (beside original one) to have more answers from vector store
use Parent Retriever in such way - get small chunks from vector store (e.g. 200 chunks) and rerank them and leave up to 150 with at least 0.4 relevance score
in Parent Retriever use that small chunks to get parent chunks - 20 chunks
this is done for 2 questions - original one and one from Multi Query retriever so I get up to 40 chunks for 1000-4000 chars
this 40 docs are reranked again (for original question) and only 30 best (having at least 0.4 relevance score) is send to llm for the answer

For my data it works like a charm with GPT-4-turbo or Claude Sonnet. Sometimes only few, best docs are left. OFC for generating additional question I use faster and cheaper model like Haiku or GPT-3.5.

So my parent retriever chunks are:

child ones - up to 200-250 chars (with up to 100-150 contextual header), Markdown splitted (so contextual) with headers
parent ones - up to 4000 (usually much smaller) with context headers, Markdown splitted

Reranking is after:

retrieving small chunks from vector store leaving 0.4 relevant ones
before sending final parent documents (so bigger ones) to LLM

LLM usually get 5000-15000 tokens question so it's like 1-3 cents for Claude Sonnet. In my case it's ok.

sugarforever commented 2 months ago

Cohere (with v2 reranking model, didn't check v3) as it is cheap and fast and gives good results. My chain - https://www.reddit.com/r/LangChain/s/xWiBObjS1z

I see. If Cohere is used, it means the data goes out to Cohere via API, if I understand correctly. Is there any alternative that can be run locally? for example open source alternatives

karol-f commented 2 months ago

There are for sure os reranking models like BERT, but didn't check that

sugarforever commented 2 months ago

There are for sure os reranking models like BERT, but didn't check that

Thanks. I had a quick search on HF. Looks https://huggingface.co/amberoad/bert-multilingual-passage-reranking-msmarco can do the job. Any example on how to integrate it with LangChain.js?

langchain-ai / langchainjs

Hybrid search and reranking #2819