Open alishashajahan opened 2 months ago
I found a similar closed issue: AttributeError: 'TextNode' object has no attribute 'get_doc_id'. The suggested solution is to ensure that the Document
class you are using has the doc_id
attribute properly set and to replace any calls to get_doc_id
with the doc_id
property.
Here is a corrected version of your code:
import json
from llama_index.legacy.indices.vector_store.base import GPTVectorStoreIndex
from llama_index import Document # Import the correct Document class
# Load the JSON file
with open(r'C:\Users\D E L L\Desktop\merged_finalfile\completd_final_fornatted_jasonfile.json') as file:
data = json.load(file)
# Convert the JSON data to a list of Document objects
documents = [Document(
text=doc['post_title'], # Assuming 'post_title' is the main text content
doc_id=doc['post_id'], # Set the doc_id
metadata={
'subreddit': doc['subreddit'],
'created_utc': doc['created_utc'],
'num_comments': doc['num_comments'],
'upvote_ratio': doc['upvote_ration'], # Corrected key
'created_date': doc['created_date'],
'created_year': doc['created_year'],
'comments': doc['comments'],
'sentiment': doc['sentiment_y'],
'emotion': doc['emotion']
}
) for doc in data]
# Create the index from the documents
index = GPTVectorStoreIndex.from_documents(documents)
Make sure you are using the Document
class from the llama_index
module, which includes the doc_id
property. This should resolve the AttributeError
you are encountering [1].
To continue talking to Dosu, mention @dosu.
@dosu again facing the same errors..
The error persists because the custom Document
class you defined does not have the get_doc_id
method or the doc_id
attribute that the GPTVectorStoreIndex
expects. You should use the Document
class provided by the llama_index
module, which includes the necessary attributes and methods.
Here is the corrected version of your code:
import json
from llama_index.legacy.indices.vector_store.base import GPTVectorStoreIndex
from llama_index import Document # Import the correct Document class
# Load the JSON file
with open(r'C:\Users\D E L L\Desktop\merged_finalfile\completd_final_fornatted_jasonfile.json') as file:
data = json.load(file)
# Convert the JSON data to a list of Document objects
documents = [Document(
text=doc['post_title'], # Assuming 'post_title' is the main text content
doc_id=doc['post_id'], # Set the doc_id
metadata={
'subreddit': doc['subreddit'],
'created_utc': doc['created_utc'],
'num_comments': doc['num_comments'],
'upvote_ratio': doc['upvote_ration'], # Corrected key
'created_date': doc['created_date'],
'created_year': doc['created_year'],
'comments': doc['comments'],
'sentiment': doc['sentiment_y'],
'emotion': doc['emotion']
}
) for doc in data]
# Create the index from the documents
index = GPTVectorStoreIndex.from_documents(documents)
Ensure you are using the Document
class from the llama_index
module, which includes the doc_id
property and other necessary methods [1].
To continue talking to Dosu, mention @dosu.
@alishashajahan Yes, Document
has no attribute get_doc_id
because you made your own document class, which will not work with anything in llama-index
Best to use the document class thats already in llama-index no? You can add all this info as metadata, as dosu was trying to suggest
Bug Description
, convertimg each entry into a Document object, and then creates an index from these documents using the GPTVectorStoreIndex.
The purpose is to prepare the data for efficient querying and analysis using the GPT model.
import json from llama_index.legacy.indices.vector_store.base import GPTVectorStoreIndex
Define the Document class (if not already defined)
class Document: def init(self, post_id, subreddit, created_utc, post_title, num_comments, upvote_ratio, created_date, created_year, comments, sentiment, emotion): self.post_id = post_id self.subreddit = subreddit self.created_utc = created_utc self.post_title = post_title self.num_comments = num_comments self.upvote_ratio = upvote_ratio self.created_date = created_date self.created_year = created_year self.comments = comments self.sentiment = sentiment self.emotion = emotion
Load the JSON file
with open(r'C:\Users\D E L L\Desktop\merged_finalfile\completd_final_fornatted_jasonfile.json') as file: data = json.load(file)
Convert the JSON data to a list of Document objects
documents = [Document( post_id=doc['post_id'], subreddit=doc['subreddit'], created_utc=doc['created_utc'], post_title=doc['post_title'], num_comments=doc['num_comments'], upvote_ratio=doc['upvote_ration'], # Corrected key created_date=doc['created_date'], created_year=doc['created_year'], comments=doc['comments'], sentiment=doc['sentiment_y'], emotion=doc['emotion'] ) for doc in data]
Create the index from the documents
index = GPTVectorStoreIndex.from_documents(documents)
Version
0.11.1
Steps to Reproduce
, convertimg each entry into a Document object, and then creates an index from these documents using the GPTVectorStoreIndex.
The purpose is to prepare the data for efficient querying and analysis using the GPT model.
import json from llama_index.legacy.indices.vector_store.base import GPTVectorStoreIndex
Define the Document class (if not already defined)
class Document: def init(self, post_id, subreddit, created_utc, post_title, num_comments, upvote_ratio, created_date, created_year, comments, sentiment, emotion): self.post_id = post_id self.subreddit = subreddit self.created_utc = created_utc self.post_title = post_title self.num_comments = num_comments self.upvote_ratio = upvote_ratio self.created_date = created_date self.created_year = created_year self.comments = comments self.sentiment = sentiment self.emotion = emotion
Load the JSON file
with open(r'C:\Users\D E L L\Desktop\merged_finalfile\completd_final_fornatted_jasonfile.json') as file: data = json.load(file)
Convert the JSON data to a list of Document objects
documents = [Document( post_id=doc['post_id'], subreddit=doc['subreddit'], created_utc=doc['created_utc'], post_title=doc['post_title'], num_comments=doc['num_comments'], upvote_ratio=doc['upvote_ration'], # Corrected key created_date=doc['created_date'], created_year=doc['created_year'], comments=doc['comments'], sentiment=doc['sentiment_y'], emotion=doc['emotion'] ) for doc in data]
Create the index from the documents
index = GPTVectorStoreIndex.from_documents(documents)
Relevant Logs/Tracbacks