langchain4j / langchain4j

Java version of LangChain
https://docs.langchain4j.dev
Apache License 2.0
4.97k stars 995 forks source link

[FEATURE] ElasticsearchEmbeddingStore supports hybrid/mixed search #1666

Open fmx0717 opened 2 months ago

fmx0717 commented 2 months ago

ElasticsearchEmbeddingStore supports mixed search of semantics and keywords

langchain4j-github-bot[bot] commented 2 months ago

/cc @dadoonet (elasticsearch)

zzutligang commented 2 months ago

me too!

fmx0717 commented 1 month ago

@langchain4j Will this feature be released in version 0.36? Is there an estimated time for the rollout plan?

dadoonet commented 1 month ago

@langchain4j I'm unsure about what we want to do here. Is it meant to be generic (for all embedding stores)? Like ElasticsearchEmbeddingStore.search(EmbeddingSearchRequest searchRequest, IsTextMatch keywordSearchRequest)...

What do you have in mind?

langchain4j commented 1 month ago

@dadoonet I am not completely sure, there are multiple options:

  1. add String query field to the EmbeddingSearchRequest. If EmbeddingStore implementation supports hybrid search, then query can be used along with queryEmbedding to perform the search. I guess all specifics of full-text search (if any) can be configured when ElasticsearchEmbeddingStore is created. But then the name EmbeddingStore becomes too narrow and should be renamed into SearchEngine or something. Now is perharps not the time to go this way (we are working on releasing stable 1.0 at the end of year), I would keep it for 2025.

  2. ElasticsearchEmbeddingStore.search(EmbeddingSearchRequest searchRequest, IsTextMatch keywordSearchRequest) or similar can be implemented for sure, but: 1) the same problem as above 2) there will be no way to plug this into DefaultRetrievalAugmentor, so users will have to implement RAG manually with quite a bit of boilerplate.

  3. More generic ContentRetriever interface can be implemented (e.g. ElasticsearchContentRetriever, similar to AzureAiSearchContentRetriever). I assume most of the vector search logic from ElasticsearchEmbeddingStore can be reused, so it should be OK.

WDYT?

langchain4j commented 1 month ago

Hi @dadoonet, any thoughts regarding my previous comment? I am sure this is an important feature to include in LC4j 1.0

dadoonet commented 1 month ago

Hey.

I started a branch 12 days ago but did not find time yet to finish it. I'm resuming my work on this today. Let see what will be the outcome.

FYI I went to the ContentRetriever route. Let see how this plays.

langchain4j commented 3 weeks ago

@dadoonet short update: I plan a release next week, so if you have something ready, please open a PR 🙏

dadoonet commented 3 weeks ago

No. It's in progress but this week has been like crazy. I might find some space middle of next week but I hardly expect that a merge will be possible before your release 😔