run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
36.93k stars 5.29k forks source link

[Bug]: Elasticsearch Query Parameter Configuration Error: window_size Less Than size #16293

Open playarikama opened 1 month ago

playarikama commented 1 month ago

Bug Description

While attempting to execute an Elasticsearch query, a BadRequestError was encountered. The error message indicates that the window_size parameter value is less than the size parameter value, which violates Elasticsearch's validation rules.

"Regarding #12695, it addressed a similar issue, but the same problem occurs when using index.as_retriever."

Version

llama-index 0.11.14

Steps to Reproduce

def _get_vector_store(self):
    return ElasticsearchStore(
        index_name=self.index_name,
        es_cloud_id=os.environ["ELASTICSEARCH_CLOUD_ID"],
        es_api_key=os.environ["ELASTICSEARCH_API_KEY"],
        retrieval_strategy=AsyncDenseVectorStrategy(hybrid=True),
    )

index = VectorStoreIndex.from_vector_store(
        vector_store=self.vector_store,
        storage_context=self.storage_context,
    )

retriever = index.as_retriever(
        similarity_top_k=100,
        embed_model=self.embedder,
    )

    nodes = retriever.retrieve(query)

Relevant Logs/Tracbacks

elasticsearch.BadRequestError: BadRequestError(400, 'action_request_validation_exception', 'Validation Failed: 1: [rank] requires [window_size: 10] be greater than or equal to [size: 100];')
dosubot[bot] commented 1 month ago

Hey @playarikama! I'm here to help you with any bugs, questions, or contributions you have. Let's tackle this Elasticsearch query parameter issue together!

To resolve the BadRequestError in Elasticsearch where the window_size parameter is less than the size parameter when using index.as_retriever in llama-index 0.11.14, you need to ensure that the window_size parameter is set to a value greater than or equal to the size parameter.

In the SentenceWindowNodeParser class, the window_size parameter is defined with a default value of 3. You can adjust this value when creating an instance of SentenceWindowNodeParser to ensure it meets the required condition.

Here is an example of how to set the window_size parameter correctly:

from llama_index.core.node_parser.text.sentence_window import SentenceWindowNodeParser

# Ensure window_size is greater than or equal to size
window_size = 100  # Example value, should be >= size parameter used in index.as_retriever

parser = SentenceWindowNodeParser(window_size=window_size)

# Use the parser as needed

Make sure to replace 100 with a value that is appropriate for your use case and is greater than or equal to the size parameter you are using with index.as_retriever [1].

Additionally, ensure that the size parameter in your ElasticsearchStore class is set correctly. The size parameter is set or modified through the batch_size attribute, which defaults to 200 but can be adjusted during initialization:

class ElasticsearchStore(BasePydanticVectorStore):
    ...
    batch_size: int = 200
    ...

    def __init__(
        self,
        ...
        batch_size: int = 200,
        ...
    ) -> None:
        ...
        self.batch_size = batch_size
        ...

You can modify the batch_size during the instantiation of the ElasticsearchStore class to ensure it aligns with your window_size parameter [2].

To continue talking to Dosu, mention @dosu.