run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
36.75k stars 5.27k forks source link

[Bug]: Can't initialize OpensearchVectorClient through AWS endpoint in async mode #16746

Open Aydin-ab opened 2 weeks ago

Aydin-ab commented 2 weeks ago

Bug Description

Hello,

If you try to initialize the OpensearchVectorClient using an AWS endpoint -and- an async connection connection_class=AsyncHttpConnection you will get an error

After debugging, the source of the issue is this last commit https://github.com/run-llama/llama_index/commit/c872b5857e059ba29318693785d3c053d01cdfb5

class OpensearchVectorClient:
    def __init__(...):
        ...
        self._os_client = os_client or self._get_opensearch_client(
            self._endpoint, **kwargs
        )
        self._os_async_client = self._get_async_opensearch_client(
            self._endpoint, **kwargs
        )
        self._os_version = self._get_opensearch_version()
    ...

    def _get_opensearch_version():
        info = self._os_client.info()
        return info["version"]["number"]

Because we set connection_class=AsyncHttpConnection only the async client self._os_async_client is valid to use. But the version is checked using the sync client self._os_client.info() which causes the error.

At the moment this is fixed by reverting to llama-index-vector-stores-opensearch==0.2.2

To fix it, we could check the connection_class of the init kwargs and use the correct client when querying the version

info = self._os_client.info() # if not async, or self._os_async_client if conncetion_class==AsyncHttpConnection

Version

0.11.20

Steps to Reproduce

initialize the OpensearchVectorClient using an AWS endpoint -and- an async connection connection_class=AsyncHttpConnection. But it's probably more about the async parameter than the AWS part.

OpensearchVectorClient(
            endpoint=[{"host": ..., "port": 443}],
            index=...,
            dim=...,
            http_auth=....,
            embedding_field='embedding',
            text_field='content',
            search_pipeline="hybrid-search-pipeline",
            use_ssl=True,
            verify_certs=True,
            connection_class=AsyncHttpConnection
    )

Relevant Logs/Tracbacks

File "/var/task/src/vectorstore.py", line 24, in load_index
    client = OpensearchVectorClient(
  File "/var/task/llama_index/vector_stores/opensearch/base.py", line 123, in __init__
    self._os_version = self._get_opensearch_version()
  File "/var/task/llama_index/vector_stores/opensearch/base.py", line 201, in _get_opensearch_version
    info = self._os_client.info()
  File "/var/task/opensearchpy/client/utils.py", line 176, in _wrapped
    return func(*args, params=params, headers=headers, **kwargs)
  File "/var/task/opensearchpy/client/__init__.py", line 308, in info
    return self.transport.perform_request(
  File "/var/task/opensearchpy/transport.py", line 416, in perform_request
    status, headers_response, data = connection.perform_request(
dosubot[bot] commented 2 weeks ago

Hello, @Aydin-ab! I'm here to help you with any bugs, questions, or contributions. Let's take a look at the issue you've shared and see how we can resolve it.

To fix the issue with initializing the OpensearchVectorClient using an AWS endpoint and an async connection with connection_class=AsyncHttpConnection, you need to modify the _get_opensearch_version method to use the asynchronous client instead of the synchronous one. Here's how you can do it:

async def _get_opensearch_version(self) -> str:
    info = await self._os_async_client.info()
    return info["version"]["number"]

Make sure to call this method in an asynchronous context since it now uses the async client. This change ensures that the version check is performed using the asynchronous client, which should resolve the issue [1].

To continue talking to Dosu, mention @dosu.

logan-markewich commented 2 weeks ago

@Aydin-ab I encourage you to make a PR for this 🙏🏻