langchain-ai / langchain-elastic

Elasticsearch integration into LangChain
MIT License
45 stars 14 forks source link

ElasticsearchWarning: text_expansion is deprecated. Use sparse_vector instead. #42

Open kazdam opened 2 months ago

kazdam commented 2 months ago

I'm using elasticsearch 8.15 and ElasticsearchStore which generates the following warnimg. Can you suggest how to mitigate this warning other than ignoring it. Is it ignorable?

venv/lib/python3.11/site-packages/langchain_elasticsearch/vectorstores.py:883: ElasticsearchWarning: text_expansion is deprecated. Use sparse_vector instead.
  hits = self._store.search(

This appears to come from this line of my code:

vector_store.similarity_search_with_score(
            query=question, 
            doc_builder=custom_doc_builder,
            filter=filter,
            k=n_results,
        )

The following are the current package levels.

% pip freeze |grep -E '(langchain|elastic)'
elastic-transport==8.15.0
elasticsearch==8.15.0
langchain==0.2.14
langchain-community==0.2.12
langchain-core==0.2.34
langchain-elasticsearch==0.2.2
langchain-huggingface==0.0.3
langchain-text-splitters==0.2.2
miguelgrinberg commented 2 months ago

First of all, you should be using the SparseVectorStrategy class. From your report my guess is that you are using the older SparseRetrievalStrategy.

Aside from that, the text_expansion query is now deprecated. It continues to be available and working, but you'll see the warning. We have not updated this package to the sparse_vector query, but we will and at that point the warning will go away.

kazdam commented 2 months ago

Thanks for your response. Yes, I am already using the SparseVectorStrategy. I should have pasted that earlier for completeness.

To ingest I am doing:

strategy = SparseVectorStrategy(model_id='.elser_model_2_linux-x86_64')
vector_store = ElasticsearchStore.from_documents(
                documents=documents,
                index_name=collection_name,
                es_connection=elastic_client,
                strategy=strategy,
                bulk_kwargs={'request_timeout': 50000}
            )

and during search, I do the following:

vector_store = ElasticsearchStore(
                index_name=collection_name,
                es_connection=elastic_client,
                strategy=self.strategy
            )

Would you suggest a different API? I started to try the lower level ones that elasticsearch publishes in their tutorials but being new at this, I appear to be making mistakes. I was going to suppress the warning but then I will not know if it becomes a problem.

sh0umik commented 2 months ago

As for the latest doc till date i am getting the same error using the latest sdk

from llama_index.vector_stores.elasticsearch import AsyncSparseVectorStrategy

sparse_vector_store = ElasticsearchStore(
    es_url="http://localhost:9200",  # for Elastic Cloud authentication see above
    index_name="movies_sparse",
    retrieval_strategy=AsyncSparseVectorStrategy(model_id=".elser_model_2"),
)
miguelgrinberg commented 2 months ago

The langchain and llamaindex integrations for Elasticsearch still use text_expansion, which is deprecated (but continues to be available). The update to the newer sparse_vector is planned for a future release. See https://github.com/elastic/elasticsearch-py/pull/2657 for more details.