Open jjoaqu7 opened 4 weeks ago
To address your questions about pre-processing and structuring your own data for use with Llama Index and Nebula Graph Store, here are the steps and recommendations:
Structuring Data for Better Returns:
Efficient Data Structuring:
Data Format for Structuring:
{
"course": "Engineering Course",
"lectures": [
{
"title": "Lecture 1",
"content": "Introduction to Thermodynamics",
"entities": [
{"name": "Thermodynamics", "type": "Concept"},
{"name": "First Law", "type": "Law"}
],
"relationships": [
{"source": "Thermodynamics", "target": "First Law", "type": "Defines"}
]
}
]
}
Llama Index and Nebula Graph Store Integration:
Here is an example script to replace Wikipedia data with your lecture content/notes and load it into Nebula Graph Store:
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import SimpleDirectoryReader
from llama_index.graph_stores.nebula import NebulaGraphStore
from llama_index.core.vector_stores.simple import SimpleVectorStore
from llama_index.core.indices.property_graph import PropertyGraphIndex
# Configure LLM and embedding models
Settings.llm = OpenAI(model="gpt-4o", temperature=0.3)
Settings.embed_model = HuggingFaceEmbedding(model_name="intfloat/multilingual-e5-large")
# Load your lecture content
documents = SimpleDirectoryReader("./data/your_lecture_notes/").load_data()
# Initialize Nebula Graph Store
graph_store = NebulaGraphStore(
space_name="llamaindex_nebula_property_graph",
edge_types="your_edge_types",
rel_prop_names="your_rel_prop_names",
tags="your_tags"
)
# Initialize Vector Store
vec_store = SimpleVectorStore()
# Create Property Graph Index from documents
index = PropertyGraphIndex.from_documents(
documents,
property_graph_store=graph_store,
vector_store=vec_store,
show_progress=True,
)
# Persist the vector store
index.storage_context.vector_store.persist("./vec_store.json")
# Example query
query = "Explain the first law of thermodynamics."
retrieved = index.as_retriever().retrieve(query)
answer = index.as_query_engine().query(query)
print(retrieved, answer)
This script demonstrates how to load your structured lecture content into Nebula Graph Store and query it using Llama Index [1][2][3]. Adjust the paths and parameters as needed for your specific use case.
To continue talking to Dosu, mention @dosu.
Can I load into the llama index documents loader JSON files, can you modify that script you provided to load a series of JSON files from a directory called "JSONs"? @dosubot
Certainly! Here is the modified script to load a series of JSON files from a directory called "JSONs" into the Llama Index documents loader:
import os
from llama_index.llms.openai import OpenAI
from llama_index.core import Settings
from llama_index.graph_stores.nebula import NebulaGraphStore
from llama_index.core.vector_stores.simple import SimpleVectorStore
from llama_index.core.indices.property_graph import PropertyGraphIndex
from llama_index.readers.json import JSONReader
# Configure LLM and embedding models
Settings.llm = OpenAI(model="gpt-4o", temperature=0.3)
# Directory containing JSON files
json_directory = "JSONs"
# Initialize the JSONReader
reader = JSONReader()
# Load all documents from the JSON files in the directory
documents = []
for filename in os.listdir(json_directory):
if filename.endswith(".json"):
file_path = os.path.join(json_directory, filename)
documents.extend(reader.load_data(file_path))
# Initialize Nebula Graph Store
graph_store = NebulaGraphStore(
space_name="llamaindex_nebula_property_graph",
edge_types="your_edge_types",
rel_prop_names="your_rel_prop_names",
tags="your_tags"
)
# Initialize Vector Store
vec_store = SimpleVectorStore()
# Create Property Graph Index from documents
index = PropertyGraphIndex.from_documents(
documents,
property_graph_store=graph_store,
vector_store=vec_store,
show_progress=True,
)
# Persist the vector store
index.storage_context.vector_store.persist("./vec_store.json")
# Example query
query = "Explain the first law of thermodynamics."
retrieved = index.as_retriever().retrieve(query)
answer = index.as_query_engine().query(query)
print(retrieved, answer)
This script will load all JSON files from the "JSONs" directory into the Llama Index documents loader and proceed with the rest of the indexing and querying process [1]. Adjust the paths and parameters as needed for your specific use case.
To continue talking to Dosu, mention @dosu.
Question Validation
Question
I have gotten the following SCRIPT to work however I have a few questions about pre-processing and structuring my own data that I would like to use in place of the wikipedia page used in the script.
SCRIPT: