langchain-pinecone retreive functions error : 'str' object has no attribute 'query'

AymaneHarkati commented 2 weeks ago

Checked other resources

[X] I added a very descriptive title to this issue.
[X] I searched the LangChain documentation with the integrated search.
[X] I used the GitHub search to find a similar question and didn't find it.
[X] I am sure that this is a bug in LangChain rather than my code.
[X] The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

from langchain_pinecone import PineconeVectorStore
from langchain_community.embeddings.huggingface import HuggingFaceEmbeddings

index_name = "langchain-test-index"
embeddings = HuggingFaceEmbeddings()
docsearch = PineconeVectorStore(index_name, embeddings)

query = "What did the president say about Ketanji Brown Jackson"
docs = docsearch.similarity_search(query)
print(docs[0].page_content)

Error Message and Stack Trace (if applicable)

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
[<ipython-input-9-87c0f7711524>](https://localhost:8080/#) in <cell line: 2>()
      1 query = "What did the president say about Ketanji Brown Jackson"
----> 2 docs = docsearch.similarity_search(query)
      3 print(docs[0].page_content)

2 frames
[/usr/local/lib/python3.10/dist-packages/langchain_pinecone/vectorstores.py](https://localhost:8080/#) in similarity_search_by_vector_with_score(self, embedding, k, filter, namespace)
    207             namespace = self._namespace
    208         docs = []
--> 209         results = self._index.query(
    210             vector=embedding,
    211             top_k=k,

AttributeError: 'str' object has no attribute 'query'

Description

I was just looking through the documentation in pinecone section, the loading part work just fine but the retrieving using any function give the error above, but with the pinecone-client it works fine. To reproduce the error, I used the code from the documentation.

System Info

langchain==0.1.16 langchain-community==0.0.34 langchain-core==0.1.46 langchain-openai==0.1.4 langchain-pinecone==0.1.0 langchain-text-splitters==0.0.1

Michal922 commented 2 weeks ago

Hello, I have reviewed the issue you reported and your code. The error AttributeError: 'str' object has no attribute 'query' indicates a problem related to the incorrect initialization of the _index object in the PineconeVectorStore instance. This problem arises from incorrect argument passing during the creation of the PineconeVectorStore object. I noticed that in your code:

docsearch = PineconeVectorStore(index_name, embeddings)

you are trying to pass arguments positionally. In the constructor of the PineconeVectorStore class, certain parameters must be passed as named arguments, especially after the * in the method definition. Here is what the relevant part of the constructor looks like:

def __init__(
    self,
    index: Optional[Any] = None,
    embedding: Optional[Embeddings] = None,
    text_key: Optional[str] = "text",
    namespace: Optional[str] = None,
    distance_strategy: Optional[DistanceStrategy] = DistanceStrategy.COSINE,
    *,
    pinecone_api_key: Optional[str] = None,
    index_name: Optional[str] = None,
):

The parameters pinecone_api_key and index_name are located after the *, which means they must be passed as named arguments. This ensures they are correctly assigned, which is crucial for the proper functioning of the _index.query() method.

How to Correct Your Code: First of all to ensure the accuracy of my test environment, I replicated your setup using Python version 3.10.14 and the same library versions that you've specified. Additionally, I found it necessary to install the sentence-transformers package, which was not initially listed. This extra step was required to enable the full functionality of the HuggingFace embeddings within the PineconeVectorStore context.

To resolve this issue, you should initialize PineconeVectorStore with clearly defined parameter names, especially for index_name and pinecone_api_key, as shown below:

index_name="langchain-test-index"
docsearch = PineconeVectorStore(index_name=index_name, embedding=embeddings, pinecone_api_key="your_api_key")`

To summarize, here's a script I crafted that does the following:

Checks if a specified index exists within the Pinecone service and creates one if it doesn't.
Initializes the PineconeVectorStore with HuggingFace embeddings.
Adds a few sample documents to our index, each with some content and associated metadata.
Executes a query to find documents similar to the input text: "What did the president say about Ketanji Brown Jackson".

I’ve included a delay after adding the documents to allow the index to update properly, as there's often a slight lag before the new documents are queryable.

Here’s the script in action:

from pinecone import Pinecone, ServerlessSpec
from langchain_pinecone import PineconeVectorStore
from langchain_community.embeddings.huggingface import HuggingFaceEmbeddings
import time

# Setup Pinecone API
PINECONE_API_KEY = "your_pinecone_api_key"
client = Pinecone(api_key=PINECONE_API_KEY)
index_name = "langchain-test-index"

# Create index if it doesn't exist
if index_name not in [idx.name for idx in client.list_indexes()]:
    spec = ServerlessSpec(cloud='aws', region='us-east-1')
    client.create_index(name=index_name, dimension=768, metric="cosine", spec=spec)

# Initialize PineconeVectorStore
docsearch = PineconeVectorStore(index_name=index_name, embedding=HuggingFaceEmbeddings(), pinecone_api_key=PINECONE_API_KEY)

# Add sample documents
documents = [
    {
        "content": "The president believes Ketanji Brown Jackson will be a good addition to the Supreme Court.",
        "metadata": {"topic": "politics"}
    },
    {
        "content": "The recent summit focused on international trade agreements.",
        "metadata": {"topic": "politics"}
    },
    {
        "content": "Scientists have discovered a new species of butterfly in the Amazon rainforest.",
        "metadata": {"topic": "science"}
    }

]

for doc in documents:
    docsearch.add_texts(texts=[doc['content']], metadatas=[doc['metadata']])

# Wait for documents to be indexed
time.sleep(5)

# Conduct similarity search
query = "What did the president say about Ketanji Brown Jackson"
results = docsearch.similarity_search(query)

# Display the most relevant document
if results:
    print("Most relevant document:", results[0].page_content)
else:
    print("No matching documents found.")

Also, make sure that the API key and the index name are correctly passed, and that the API key is valid. Passing all key arguments as named arguments eliminates ambiguities about which values are assigned to which parameters, which is very important in constructors with many optional or key parameters. By following these guidelines, you should be able to avoid errors related to incorrect value assignment and ensure the proper functionality of your application. Hope it helps Best regards, Michał

AymaneHarkati commented 2 weeks ago

I will try your solution and give you a feedback, thank you for your time.

AymaneHarkati commented 2 weeks ago

The solution as stated by @Michal922, thank you again.

langchain-ai / langchain