run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
36.64k stars 5.25k forks source link

[Bug]: llama-index-vector-stores-opensearch -> TypeError: object tuple can't be used in 'await' expression #11826

Closed ecatkins closed 5 months ago

ecatkins commented 8 months ago

Bug Description

The introduction of async behavior into the OpenSearch integration has caused errors in previously working code.

Version

llama-index-vector-stores-opensearch==0.1.7

Steps to Reproduce

pip install opensearch-py==2.4.2
pip install llama-index-vector-stores-opensearch==0.1.7

The code below previously worked...

import os
import boto3
from opensearchpy import RequestsHttpConnection, AWSV4SignerAuth

from llama_index.vector_stores.opensearch import OpensearchVectorClient

text_field = "content"
# OpensearchVectorClient stores embeddings in this field by default
embedding_field = "embedding"

endpoint = "my-endpoint"
idx = "my-index"

credentials = boto3.Session().get_credentials()
region = "us-east-1"
service = "aoss"
auth = AWSV4SignerAuth(credentials, region, service=service)

client = OpensearchVectorClient(
                endpoint,
                idx,
                1536,
                embedding_field=embedding_field,
                text_field=text_field,
                http_auth=auth,
                use_ssl=True,
                verify_certs=True,
                connection_class=RequestsHttpConnection,
            )

But now causes error

   client = OpensearchVectorClient(
             ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/eatkins/.pyenv/versions/3.11.2/lib/python3.11/site-packages/llama_index/vector_stores/opensearch/base.py", line 324, in __init__
    event_loop.run_until_complete(
  File "/Users/eatkins/.pyenv/versions/3.11.2/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/Users/eatkins/.pyenv/versions/3.11.2/lib/python3.11/site-packages/opensearchpy/_async/client/indices.py", line 255, in get
    return await self.transport.perform_request(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/eatkins/.pyenv/versions/3.11.2/lib/python3.11/site-packages/opensearchpy/_async/transport.py", line 390, in perform_request
    status, headers_response, data = await connection.perform_request(
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: object tuple can't be used in 'await' expression

Relevant Logs/Tracbacks

No response

dosubot[bot] commented 8 months ago

Hey @ecatkins, nice to see you back in the world of code conundrums! 🚀

Thank you for providing the detailed information. The error you're encountering, TypeError: object tuple can't be used in 'await' expression, is likely due to an await being applied to a function or operation that returns a tuple, not a coroutine or future.

In the provided code, the initialization of OpensearchVectorClient involves several asynchronous operations, particularly with calls to self._os_client.indices.get, self._os_client.indices.create, and self._os_client.indices.refresh within an asyncio event loop. These operations are correctly awaited within the asynchronous context.

However, the error message suggests that somewhere in the process, an await is applied to a function or operation that returns a tuple, not a coroutine. Given the provided code, the direct cause of this error is not visible within the snippets you've shared. The initialization and usage of OpensearchVectorClient and its methods seem correct in terms of awaiting asynchronous calls.

The issue might stem from how the asyncio event loop is being used or from a part of the code not included in the provided context. For example, if there's a callback or utility function used in the process that returns a tuple instead of an awaitable object, and an await is mistakenly applied to it, that would cause the error you're seeing.

To resolve this issue, check the following:

  1. Ensure all functions that are awaited are indeed coroutine functions (defined with async def) or return an awaitable object.
  2. If the error occurs during the initialization or usage of OpensearchVectorClient, trace back the asynchronous calls to see if there's a mismatch in what is being awaited.
  3. Review any custom callbacks or utility functions used in the asynchronous flow for incorrect return types.

Without more specific details on where the error occurs (e.g., a traceback or the exact line causing the issue), it's challenging to pinpoint the exact cause. However, reviewing the use of await with respect to the functions' return types in the asynchronous flow should help identify and resolve the issue.

As for the changes in the latest version of llama-index-vector-stores-opensearch that introduced async behavior into the OpenSearch integration, they primarily involve the use of AsyncOpenSearch from the opensearchpy library for asynchronous communication with OpenSearch, asynchronous bulk ingestion of embeddings into an OpenSearch index, asynchronous methods for indexing and deletion, asynchronous querying of the OpenSearch index, and error handling and configuration for the index.

These changes collectively introduce asynchronous behavior to the OpenSearch integration, enabling more efficient handling of I/O-bound operations, such as network requests to OpenSearch, by not blocking the execution thread. This is particularly beneficial for applications that require high throughput and scalability when interacting with OpenSearch for vector storage and retrieval operations.

I hope this helps! If you have any further questions or need more clarification, please don't hesitate to ask.

Sources

#### About Dosu This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.
logan-markewich commented 8 months ago

@ecatkins seems like an issue in the opensearch client package? The code isn't doing anything obvious that would cause this. Maybe try updating? Or downgrading?

It's failing on this line https://github.com/run-llama/llama_index/blob/3e5d0a146fcda01a984818d381f31a19287aead8/llama-index-integrations/vector_stores/llama-index-vector-stores-opensearch/llama_index/vector_stores/opensearch/base.py#L325

ecatkins commented 8 months ago

@logan-markewich I am using the latest version of opensearch-py and downgrading that does not seem to have an effect.

However downgrading to llama-index-vector-stores-opensearch-0.1.4 does work

brycecf commented 8 months ago

Reporting same error as well in 0.1.8 with resolution by switching to 0.1.4.

ahs8w commented 7 months ago

Replace RequestsHttpConnection with AsyncHttpConnection for your connection_class and it should work.

Alternatively, it also works if you don't specify a connection_class at all.

ecatkins commented 5 months ago

Changing to AsyncHttpConnection resolves the issue