run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
33.79k stars 4.75k forks source link

[Bug]: AttributeError: 'PineconeVectorStore' object has no attribute 'service_context' #13702

Open maheshchandrra28 opened 1 month ago

maheshchandrra28 commented 1 month ago

Bug Description

Hi Team LLaMA

I'm trying to connect to a pinecone vector store using llama index.Below is the code for same:

Source Code:

import pinecone
from dotenv import load_dotenv
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import Settings
from load_envs import *
from llama_index.vector_stores.pinecone import PineconeVectorStore

load_dotenv()

pc = pinecone.Pinecone(api_key=PINECONE_API_KEY,environment=PINECONE_ZONE)

embed_model = OpenAIEmbedding(embed_batch_size=1,openai_api_key=OPENAI_API_KEY,model=OPENAI_EMBEDDING_MODEL_NAME )

vector_store = PineconeVectorStore(
    pinecone_index=pc,
    embedding=embed_model
)

vector_store_info = VectorStoreInfo(
    content_info="Information about blogs",
    metadata_info=[
        MetadataInfo(
            name="source",
            type="str",
            description=("URL of the blog from which content is extracted from."),
        ),
        MetadataInfo(
            name="bucket",
            type="str",
            description=("To which bucket does the data belong to"),
        ),
        MetadataInfo(
            name="text",
            type="str",
            description=("Content from the blog"),
        ),
    ],
)
retriever = VectorIndexAutoRetriever(
    vector_store,
    vector_store_info=vector_store_info,
    empty_query_top_k=10,
    # this is a hack to allow for blank queries in pinecone
    default_empty_query_vector=[0] * 1536,
    verbose=True,
)

I've listed the index and was able to extract information on available index.When I try to run the above code was getting the below error:

Error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[21], line 44
     18 vector_store = PineconeVectorStore(
     19     pinecone_index=pc,
     20     embedding=embed_model
     21 )
     24 vector_store_info = VectorStoreInfo(
     25     content_info="Information about blogs",
     26     metadata_info=[
   (...)
     42     ],
     43 )
---> 44 retriever = VectorIndexAutoRetriever(
     45     vector_store,
     46     vector_store_info=vector_store_info,
     47     empty_query_top_k=10,
     48     # this is a hack to allow for blank queries in pinecone
     49     default_empty_query_vector=[0] * 1536,
     50     verbose=True,
     51 )

File ~/anaconda3/lib/python3.11/site-packages/llama_index/core/indices/vector_store/retrievers/auto_retriever/auto_retriever.py:99, in VectorIndexAutoRetriever.__init__(self, index, vector_store_info, llm, prompt_template_str, max_top_k, similarity_top_k, empty_query_top_k, vector_store_query_mode, default_empty_query_vector, callback_manager, verbose, extra_filters, object_map, objects, service_context, **kwargs)
     96 self._vector_store_info = vector_store_info
     97 self._default_empty_query_vector = default_empty_query_vector
---> 99 service_context = service_context or self._index.service_context
    100 self._llm = llm or llm_from_settings_or_context(Settings, service_context)
    101 callback_manager = (
    102     callback_manager
    103     or callback_manager_from_settings_or_context(Settings, service_context)
    104 )

AttributeError: 'PineconeVectorStore' object has no attribute 'service_context'

Can you please let me know where its going wrong?

Version

0.10.38

Steps to Reproduce

You can please run the mentioned source code pointing to any pinecone index.

pinecone=="3.2.2"

Relevant Logs/Tracbacks

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[27], line 46
     20 vector_store = PineconeVectorStore(
     21     pinecone_index=pc,
     22     embedding=embed_model
     23 )
     26 vector_store_info = VectorStoreInfo(
     27     content_info="Information about blogs",
     28     metadata_info=[
   (...)
     44     ],
     45 )
---> 46 retriever = VectorIndexAutoRetriever(
     47     vector_store,
     48     vector_store_info=vector_store_info,
     49     empty_query_top_k=10,
     50     # this is a hack to allow for blank queries in pinecone
     51     default_empty_query_vector=[0] * 1536,
     52     verbose=True,
     53 )

File ~/anaconda3/lib/python3.11/site-packages/llama_index/core/indices/vector_store/retrievers/auto_retriever/auto_retriever.py:99, in VectorIndexAutoRetriever.__init__(self, index, vector_store_info, llm, prompt_template_str, max_top_k, similarity_top_k, empty_query_top_k, vector_store_query_mode, default_empty_query_vector, callback_manager, verbose, extra_filters, object_map, objects, service_context, **kwargs)
     96 self._vector_store_info = vector_store_info
     97 self._default_empty_query_vector = default_empty_query_vector
---> 99 service_context = service_context or self._index.service_context
    100 self._llm = llm or llm_from_settings_or_context(Settings, service_context)
    101 callback_manager = (
    102     callback_manager
    103     or callback_manager_from_settings_or_context(Settings, service_context)
    104 )

AttributeError: 'PineconeVectorStore' object has no attribute 'service_context'
dosubot[bot] commented 1 month ago

The error AttributeError: 'PineconeVectorStore' object has no attribute 'service_context' occurs because the VectorIndexAutoRetriever class is trying to access a service_context attribute that doesn't exist in the PineconeVectorStore instance.

To fix this, you need to ensure that the service_context is correctly set in the VectorIndexAutoRetriever class. Here is the corrected code snippet for the VectorIndexAutoRetriever class:

def __init__(
        self,
        index: VectorStoreIndex,
        vector_store_info: VectorStoreInfo,
        llm: Optional[LLM] = None,
        prompt_template_str: Optional[str] = None,
        max_top_k: int = 10,
        similarity_top_k: int = DEFAULT_SIMILARITY_TOP_K,
        empty_query_top_k: Optional[int] = 10,
        vector_store_query_mode: VectorStoreQueryMode = VectorStoreQueryMode.DEFAULT,
        default_empty_query_vector: Optional[List[float]] = None,
        callback_manager: Optional[CallbackManager] = None,
        verbose: bool = False,
        extra_filters: Optional[MetadataFilters] = None,
        object_map: Optional[dict] = None,
        objects: Optional[List[IndexNode]] = None,
        service_context: Optional[ServiceContext] = None,
        **kwargs: Any,
    ) -> None:
        self._index = index
        self._vector_store_info = vector_store_info
        self._default_empty_query_vector = default_empty_query_vector

        service_context = service_context or self._index.service_context
        self._llm = llm or llm_from_settings_or_context(self._index.settings, service_context)
        callback_manager = (
            callback_manager
            or callback_manager_from_settings_or_context(self._index.settings, service_context)
        )

        prompt_template_str = (
            prompt_template_str or DEFAULT_VECTOR_STORE_QUERY_PROMPT_TMPL
        )
        self._output_parser = VectorStoreQueryOutputParser()
        self._prompt = PromptTemplate(template=prompt_template_str)
        self._max_top_k = max_top_k
        self._similarity_top_k = similarity_top_k
        self._empty_query_top_k = empty_query_top_k
        self._vector_store_query_mode = vector_store_query_mode
        if extra_filters is not None and extra_filters.condition == FilterCondition.OR:
            raise ValueError("extra_filters cannot be OR condition")
        self._extra_filters = extra_filters or MetadataFilters(filters=[])
        self._kwargs = kwargs
        super().__init__(
            callback_manager=callback_manager,
            object_map=object_map or self._index._object_map,
            objects=objects,
            verbose=verbose,
        )

This change ensures that the Settings used are specific to the index instance, which should resolve the issue.

Additionally, ensure that your PineconeVectorStore class is correctly initialized and includes all necessary attributes and methods. Here is a brief overview of the required attributes and methods for the PineconeVectorStore class:

Attributes:

Methods:

Ensure that your PineconeVectorStore class is implemented with these attributes and methods to avoid any further issues.

#### Details **Sources** - [Pinecone retriever [AttributeError: 'Pinecone' object has no attribute 'query']](https://github.com/run-llama/llama_index/discussions/13362) - [[Bug]: AttributeError: 'str' object has no attribute 'upsert'](https://github.com/run-llama/llama_index/issues/9506) - [AttributeError: 'LLMPredictor' object has no attribute 'get_text_from_nodes'](https://github.com/run-llama/llama_index/issues/3353) - [llama-index-core/llama_index/core/indices/vector_store/retrievers/auto_retriever/auto_retriever.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/indices/vector_store/retrievers/auto_retriever/auto_retriever.py) - [llama-index-integrations/vector_stores/llama-index-vector-stores-pinecone/pyproject.toml](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/vector_stores/llama-index-vector-stores-pinecone/pyproject.toml) - [llama-index-legacy/llama_index/legacy/vector_stores/pinecone_utils.py](https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/llama_index/legacy/vector_stores/pinecone_utils.py) - [llama-index-integrations/vector_stores/llama-index-vector-stores-pinecone/llama_index/vector_stores/pinecone/utils.py](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/vector_stores/llama-index-vector-stores-pinecone/llama_index/vector_stores/pinecone/utils.py) - [llama-index-integrations/vector_stores/llama-index-vector-stores-pinecone/llama_index/vector_stores/pinecone/base.py](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/vector_stores/llama-index-vector-stores-pinecone/llama_index/vector_stores/pinecone/base.py) - [llama-index-legacy/llama_index/legacy/vector_stores/pinecone.py](https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/llama_index/legacy/vector_stores/pinecone.py)
**About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

maheshchandrra28 commented 1 month ago

But service_context is an Optional variable as per the class definition, it shouldn't throw any error right?

@dosu

dosubot[bot] commented 1 month ago

The error occurs because the PineconeVectorStore class does not have an attribute service_context, which is being accessed in the VectorIndexAutoRetriever class. Although service_context is optional, the code still attempts to access it, leading to the AttributeError.

To resolve this, you can manually add the service_context attribute to the PineconeVectorStore instance before passing it to the VectorIndexAutoRetriever. Here is an updated version of your code snippet:

import pinecone
from dotenv import load_dotenv
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import Settings
from load_envs import *
from llama_index.vector_stores.pinecone import PineconeVectorStore

load_dotenv()

pc = pinecone.Pinecone(api_key=PINECONE_API_KEY, environment=PINECONE_ZONE)

embed_model = OpenAIEmbedding(embed_batch_size=1, openai_api_key=OPENAI_API_KEY, model=OPENAI_EMBEDDING_MODEL_NAME)

vector_store = PineconeVectorStore(
    pinecone_index=pc,
    embedding=embed_model
)

# Manually add the service_context attribute
vector_store.service_context = None  # or set it to an appropriate value if available

vector_store_info = VectorStoreInfo(
    content_info="Information about blogs",
    metadata_info=[
        MetadataInfo(
            name="source",
            type="str",
            description="URL of the blog from which content is extracted from.",
        ),
        MetadataInfo(
            name="bucket",
            type="str",
            description="To which bucket does the data belong to",
        ),
        MetadataInfo(
            name="text",
            type="str",
            description="Content from the blog",
        ),
    ],
)

retriever = VectorIndexAutoRetriever(
    vector_store,
    vector_store_info=vector_store_info,
    empty_query_top_k=10,
    default_empty_query_vector=[0] * 1536,
    verbose=True,
)

By adding the service_context attribute to the vector_store instance, you should be able to avoid the AttributeError. If you have a specific value for service_context, you can set it accordingly instead of None.

#### Details **Sources** - [Pinecone retriever [AttributeError: 'Pinecone' object has no attribute 'query']](https://github.com/run-llama/llama_index/discussions/13362) - [[Bug]: AttributeError: 'str' object has no attribute 'upsert'](https://github.com/run-llama/llama_index/issues/9506) - [llama-index-integrations/vector_stores/llama-index-vector-stores-pinecone/llama_index/vector_stores/pinecone/__init__.py](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/vector_stores/llama-index-vector-stores-pinecone/llama_index/vector_stores/pinecone/__init__.py) - [llama-index-integrations/vector_stores/llama-index-vector-stores-pinecone/llama_index/vector_stores/pinecone/base.py](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/vector_stores/llama-index-vector-stores-pinecone/llama_index/vector_stores/pinecone/base.py) - [llama-index-legacy/llama_index/legacy/vector_stores/pinecone.py](https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/llama_index/legacy/vector_stores/pinecone.py) - [docs/docs/examples/data_connectors/PineconeDemo.ipynb](https://github.com/run-llama/llama_index/blob/main/docs/docs/examples/data_connectors/PineconeDemo.ipynb) - [docs/docs/examples/vector_stores/PineconeIndexDemo.ipynb](https://github.com/run-llama/llama_index/blob/main/docs/docs/examples/vector_stores/PineconeIndexDemo.ipynb) - [docs/docs/examples/vector_stores/existing_data/pinecone_existing_data.ipynb](https://github.com/run-llama/llama_index/blob/main/docs/docs/examples/vector_stores/existing_data/pinecone_existing_data.ipynb) - [llama-index-integrations/vector_stores/llama-index-vector-stores-pinecone/pyproject.toml](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/vector_stores/llama-index-vector-stores-pinecone/pyproject.toml)
**About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

RussellLuo commented 1 month ago

Hi @maheshchandrra28, the first parameter of VectorIndexAutoRetriever should be of type VectorStoreIndex instead of VectorStore.

vector_store = PineconeVectorStore(
    pinecone_index=pc,
    embedding=embed_model
)

...

+ storage_context = StorageContext.from_defaults(vector_store=vector_store)
+ index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)

retriever = VectorIndexAutoRetriever(
-   vector_store,
+   index,
    vector_store_info=vector_store_info,
    empty_query_top_k=10,
    # this is a hack to allow for blank queries in pinecone
    default_empty_query_vector=[0] * 1536,
    verbose=True,
)

For details, see https://docs.llamaindex.ai/en/stable/examples/vector_stores/PineconeIndexDemo/.

maheshchandrra28 commented 1 month ago

Tried assigning service_context as None, still getting the same error.

`--------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In[28], line 18 15 from llama_index.core import Settings 17 embed_model = OpenAIEmbedding(embed_batch_size=1,openai_api_key=OPENAI_API_KEY,model=OPENAI_EMBEDDING_MODEL_NAME ) ---> 18 vector_store.service_context = None # or set it to an appropriate value if available 21 vector_store = PineconeVectorStore( 22 pinecone_index=pc, 23 embedding=embed_model 24 ) 27 vector_store_info = VectorStoreInfo( 28 content_info="Information about blogs", 29 metadata_info=[ (...) 45 ], 46 )

File ~/anaconda3/lib/python3.11/site-packages/pydantic/v1/main.py:357, in BaseModel.setattr(self, name, value) 354 return object_setattr(self, name, value) 356 if self.config.extra is not Extra.allow and name not in self.fields: --> 357 raise ValueError(f'"{self.class.name}" object has no field "{name}"') 358 elif not self.config.allow_mutation or self.config.frozen: 359 raise TypeError(f'"{self.class.name}" is immutable and does not support item assignment')

ValueError: "PineconeVectorStore" object has no field "service_context"`