run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
35.89k stars 5.09k forks source link

[Bug]: VertexAIVectorStore fails w/ IndexError #14017

Closed stamak closed 3 weeks ago

stamak commented 4 months ago

Bug Description

Following this https://docs.llamaindex.ai/en/stable/examples/vector_stores/VertexAIVectorSearchDemo/#create-a-simple-vector-store-from-plain-text-without-metadata-filters

When executing

# setup storage
vector_store = VertexAIVectorStore(
    project_id=PROJECT_ID,
    region=REGION,
    index_id=vs_index.resource_name,
    endpoint_id=vs_endpoint.resource_name,
    gcs_bucket_name=GCS_BUCKET_NAME,
)

It fails w/ an error IndexError: list index out of range

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[32], line 4
      2 print(vs_endpoint.resource_name)
      3 # setup storage
----> 4 vector_store = VertexAIVectorStore(
      5     project_id=PROJECT_ID,
      6     region=REGION,
      7     index_id=vs_index.resource_name,
      8     endpoint_id=vs_endpoint.resource_name,
      9     gcs_bucket_name=GCS_BUCKET_NAME,
     10 )

File ~/GIT/lll/VertexAI/venv/lib/python3.12/site-packages/llama_index/vector_stores/vertexaivectorsearch/base.py:121, in VertexAIVectorStore.__init__(self, project_id, region, index_id, endpoint_id, gcs_bucket_name, credentials_path, text_key, remove_text_from_metadata, **kwargs)
    116 _sdk_manager = VectorSearchSDKManager(
    117     project_id=project_id, region=region, credentials_path=credentials_path
    118 )
    120 # get index and endpoint resource names including metadata
--> 121 self._index = _sdk_manager.get_index(index_id=index_id)
    122 self._endpoint = _sdk_manager.get_endpoint(endpoint_id=endpoint_id)
    123 self._index_metadata = self._index.to_dict()

File ~/GIT/lll/VertexA/venv/lib/python3.12/site-packages/llama_index/vector_stores/vertexaivectorsearch/_sdk_manager.py:98, in VectorSearchSDKManager.get_index(self, index_id)
     89 """Retrieves a MatchingEngineIndex (VectorSearchIndex) by id.
     90 
     91 Args:
   (...)
     95     MatchingEngineIndex instance.
     96 """
     97 _, user_agent = get_user_agent("llama-index-vector-stores-vertexaivectorsearch")
---> 98 with telemetry.tool_context_manager(user_agent):
     99     return MatchingEngineIndex(
    100         index_name=index_id,
    101         project=self._project_id,
    102         location=self._region,
    103         credentials=self._credentials,
    104     )

File /opt/homebrew/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/contextlib.py:137, in _GeneratorContextManager.__enter__(self)
    135 del self.args, self.kwds, self.func
    136 try:
--> 137     return next(self.gen)
    138 except StopIteration:
    139     raise RuntimeError("generator didn't yield") from None

File ~/GIT/lll/VertexAI/venv/lib/python3.12/site-packages/google/cloud/aiplatform/telemetry.py:48, in tool_context_manager(tool_name)
     27 @contextlib.contextmanager
     28 def tool_context_manager(tool_name: str) -> None:
     29     """Context manager for appending tool name to client instantiations.
     30 
     31     Most client instantiations occur at construction time. There are a few
   (...)
     46         None
     47     """
---> 48     _append_tool_name(tool_name)
     49     try:
     50         yield

File ~/GIT/lll/VertexAI/venv/lib/python3.12/site-packages/google/cloud/aiplatform/telemetry.py:56, in _append_tool_name(tool_name)
     55 def _append_tool_name(tool_name: str) -> None:
---> 56     if _tool_names_to_append[-1] != tool_name:
     57         _tool_names_to_append.append(tool_name)

IndexError: list index out of range

Version

0.10.43

Steps to Reproduce

Follow https://docs.llamaindex.ai/en/stable/examples/vector_stores/VertexAIVectorSearchDemo/#create-a-simple-vector-store-from-plain-text-without-metadata-filters

Relevant Logs/Tracbacks

$ pip list | grep llama-index
llama-index                                    0.10.43
llama-index-agent-openai                       0.2.7
llama-index-cli                                0.1.12
llama-index-core                               0.10.43
llama-index-embeddings-openai                  0.1.10
llama-index-embeddings-vertex                  0.1.0
llama-index-indices-managed-llama-cloud        0.1.6
llama-index-legacy                             0.9.48
llama-index-llms-openai                        0.1.22
llama-index-llms-vertex                        0.1.8
llama-index-multi-modal-llms-openai            0.1.6
llama-index-program-openai                     0.1.6
llama-index-question-gen-openai                0.1.3
llama-index-readers-file                       0.1.23
llama-index-readers-llama-parse                0.1.4
llama-index-vector-stores-vertexaivectorsearch 0.0.1
dosubot[bot] commented 4 months ago

Hey @stamak! I'm here to help you with any bugs, questions, or contributions. Let's tackle this together while we wait for a human maintainer.

To resolve the IndexError: list index out of range when initializing a VertexAIVectorStore, ensure that all required parameters are correctly provided and initialized. The error might be due to missing or incorrectly specified parameters.

Here is the corrected initialization code for VertexAIVectorStore:

from llama_index.vector_stores.vertexaivectorsearch import VertexAIVectorStore

vector_store = VertexAIVectorStore(
    project_id=PROJECT_ID,
    region=REGION,
    index_id=vs_index.resource_name,
    endpoint_id=vs_endpoint.resource_name,
    gcs_bucket_name=GCS_BUCKET_NAME,  # Optional
    credentials_path="<path_to_credentials>"  # Optional
)

Ensure that PROJECT_ID, REGION, vs_index.resource_name, and vs_endpoint.resource_name are correctly specified. If using optional parameters like gcs_bucket_name and credentials_path, make sure they are also correctly provided.

Additionally, verify that the VertexAIVectorStore class is correctly implemented and all required attributes are initialized properly. Here is a snippet of the class definition for reference:

class VertexAIVectorStore(BasePydanticVectorStore):
    def __init__(
        self,
        project_id: Optional[str] = None,
        region: Optional[str] = None,
        index_id: Optional[str] = None,
        endpoint_id: Optional[str] = None,
        gcs_bucket_name: Optional[str] = None,
        credentials_path: Optional[str] = None,
        text_key: str = DEFAULT_TEXT_KEY,
        remove_text_from_metadata: bool = True,
        **kwargs: Any,
    ) -> None:
        super().__init__(
            project_id=project_id,
            region=region,
            index_id=index_id,
            endpoint_id=endpoint_id,
            gcs_bucket_name=gcs_bucket_name,
            credentials_path=credentials_path,
            text_key=text_key,
            remove_text_from_metadata=remove_text_from_metadata,
        )

        _sdk_manager = VectorSearchSDKManager(
            project_id=project_id, region=region, credentials_path=credentials_path
        )

        self._index = _sdk_manager.get_index(index_id=index_id)
        self._endpoint = _sdk_manager.get_endpoint(endpoint_id=endpoint_id)
        self._index_metadata = self._index.to_dict()

        self._stream_update = False
        if self._index_metadata["indexUpdateMethod"] == "STREAM_UPDATE":
            self._stream_update = True

        if self.gcs_bucket_name:
            self._staging_bucket = _sdk_manager.get_gcs_bucket(
                bucket_name=gcs_bucket_name
            )
        else:
            self._staging_bucket = None

By ensuring all required parameters are correctly provided and initialized, you can avoid the IndexError: list index out of range. Additionally, you can refer to the test cases provided in the test_vector_stores_vertexaivectorsearch.py file to ensure your setup is correct and to see examples of how to use the VertexAIVectorStore class [1][2][3].

To continue talking to Dosu, mention @dosu.