AttributeError Traceback (most recent call last)
Cell In[12], line 2
1 for i in range(0, len(docs)):
----> 2 vector_store.add_documents(documents=docs[i])
3 time.sleep(5)
File ~/anaconda3/envs/rag_azure/lib/python3.10/site-packages/langchain_core/vectorstores.py:136, in VectorStore.add_documents(self, documents, kwargs)
127 """Run more documents through the embeddings and add to the vectorstore.
128
129 Args:
(...)
133 List[str]: List of IDs of the added texts.
134 """
135 # TODO: Handle the case where the user doesn't provide ids on the Collection
--> 136 texts = [doc.page_content for doc in documents]
137 metadatas = [doc.metadata for doc in documents]
138 return self.add_texts(texts, metadatas, kwargs)
File ~/anaconda3/envs/rag_azure/lib/python3.10/site-packages/langchain_core/vectorstores.py:136, in (.0)
127 """Run more documents through the embeddings and add to the vectorstore.
128
129 Args:
(...)
133 List[str]: List of IDs of the added texts.
134 """
135 # TODO: Handle the case where the user doesn't provide ids on the Collection
--> 136 texts = [doc.page_content for doc in documents]
137 metadatas = [doc.metadata for doc in documents]
138 return self.add_texts(texts, metadatas, **kwargs)
AttributeError: 'tuple' object has no attribute 'page_content'
Description
I am using langchain to connect to Azure AI Search and create vector stores and add documents to them so I can create a RAG application. I tried to replicate the notebook provided by Langchain for Azure AI Search https://python.langchain.com/docs/integrations/vectorstores/azuresearch/ but its failing with the above error
I do see page_content in 'docs' so I am not sure where is the problem. I got langchain_core.documents.base.Document on type(docs[0])
Here is an example of how one of the element of the doc looks
print(docs[5])
Document(page_content='Modify the likelihood of specified tokens appearing in the completion. Accepts a json object that maps tokens (specified by their token ID in the GPT tokenizer) to an associated bias value from -100 to 100. You can use this tokenizer tool (which works for both GPT-2 and GPT-3) to convert text to token IDs. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect varies per model, but values between -1 and 1 should decrease or increase likelihood of selection', metadata={'source': 'test.pdf', 'page': 3})
Checked other resources
Example Code
`%pip install --upgrade --quiet azure-search-documents %pip install --upgrade --quiet azure-identity
import os
from langchain_community.vectorstores.azuresearch import AzureSearch from langchain_openai import AzureOpenAIEmbeddings, OpenAIEmbeddings
Option 2: use an Azure OpenAI account with a deployment of an embedding model
azure_endpoint: str = "PLACEHOLDER FOR YOUR AZURE OPENAI ENDPOINT" azure_openai_api_key: str = "PLACEHOLDER FOR YOUR AZURE OPENAI KEY" azure_openai_api_version: str = "2023-05-15" azure_deployment: str = "text-embedding-ada-002"
vector_store_address: str = "YOUR_AZURE_SEARCH_ENDPOINT" vector_store_password: str = "YOUR_AZURE_SEARCH_ADMIN_KEY"
Option 2: Use AzureOpenAIEmbeddings with an Azure account
embeddings: AzureOpenAIEmbeddings = AzureOpenAIEmbeddings( azure_deployment=azure_deployment, openai_api_version=azure_openai_api_version, azure_endpoint=azure_endpoint, api_key=azure_openai_api_key, )
index_name: str = "langchain-vector-demo" vector_store: AzureSearch = AzureSearch( azure_search_endpoint=vector_store_address, azure_search_key=vector_store_password, index_name=index_name, embedding_function=embeddings.embed_query, )
from langchain.text_splitter import ( CharacterTextSplitter, RecursiveCharacterTextSplitter, ) from langchain.document_loaders import DirectoryLoader, PyPDFLoader
Read the PDF file using the langchain loader
pdf_link = "test.pdf" loader = PyPDFLoader(pdf_link, extract_images=False) data = loader.load_and_split()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0) docs = text_splitter.split_documents(data)
vector_store.add_documents(documents=docs)`
Error Message and Stack Trace (if applicable)
AttributeError Traceback (most recent call last) Cell In[12], line 2 1 for i in range(0, len(docs)): ----> 2 vector_store.add_documents(documents=docs[i]) 3 time.sleep(5)
File ~/anaconda3/envs/rag_azure/lib/python3.10/site-packages/langchain_core/vectorstores.py:136, in VectorStore.add_documents(self, documents, kwargs) 127 """Run more documents through the embeddings and add to the vectorstore. 128 129 Args: (...) 133 List[str]: List of IDs of the added texts. 134 """ 135 # TODO: Handle the case where the user doesn't provide ids on the Collection --> 136 texts = [doc.page_content for doc in documents] 137 metadatas = [doc.metadata for doc in documents] 138 return self.add_texts(texts, metadatas, kwargs)
File ~/anaconda3/envs/rag_azure/lib/python3.10/site-packages/langchain_core/vectorstores.py:136, in(.0)
127 """Run more documents through the embeddings and add to the vectorstore.
128
129 Args:
(...)
133 List[str]: List of IDs of the added texts.
134 """
135 # TODO: Handle the case where the user doesn't provide ids on the Collection
--> 136 texts = [doc.page_content for doc in documents]
137 metadatas = [doc.metadata for doc in documents]
138 return self.add_texts(texts, metadatas, **kwargs)
AttributeError: 'tuple' object has no attribute 'page_content'
Description
I am using langchain to connect to Azure AI Search and create vector stores and add documents to them so I can create a RAG application. I tried to replicate the notebook provided by Langchain for Azure AI Search https://python.langchain.com/docs/integrations/vectorstores/azuresearch/ but its failing with the above error
I do see page_content in 'docs' so I am not sure where is the problem. I got langchain_core.documents.base.Document on type(docs[0])
Here is an example of how one of the element of the doc looks print(docs[5]) Document(page_content='Modify the likelihood of specified tokens appearing in the completion. Accepts a json object that maps tokens (specified by their token ID in the GPT tokenizer) to an associated bias value from -100 to 100. You can use this tokenizer tool (which works for both GPT-2 and GPT-3) to convert text to token IDs. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect varies per model, but values between -1 and 1 should decrease or increase likelihood of selection', metadata={'source': 'test.pdf', 'page': 3})
System Info
platform - mac python - 3.10
langchain==0.1.15 langchain-community==0.0.32 langchain-core==0.1.41 langchain-openai==0.0.2.post1 langchain-text-splitters==0.0.1