[Question]: NebulaGraph RAG

Chengyang852 commented 2 months ago

Question Validation

[X] I have searched both the documentation and discord for an answer.

Question

I want to use NebulaGraph to do RAG. Now that all my data has been stored in NG, what should I do next? Should I query all the data and then do RAG? Can you give a demo?

dosubot[bot] commented 2 months ago

To use NebulaGraph for Retrieval-Augmented Generation (RAG) with your existing data, follow these steps:

Install Required Packages:

%pip install llama-index-llms-openai
%pip install llama-index-embeddings-openai
%pip install llama-index-graph-stores-nebula
%pip install llama-index-llms-azure-openai
%pip install nebula3-python
%pip install ipython-ngql networkx pyvis

Set Up OpenAI or Azure OpenAI: For OpenAI:

import os
from llama_index.llms.openai import OpenAI
from llama_index.core import Settings

os.environ["OPENAI_API_KEY"] = "INSERT OPENAI KEY"

llm = OpenAI(temperature=0, model="gpt-3.5-turbo")
Settings.llm = llm
Settings.chunk_size = 512

For Azure OpenAI:

import os
import openai
from llama_index.llms.azure_openai import AzureOpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import Settings

openai.api_type = "azure"
openai.api_base = "https://<foo-bar>.openai.azure.com"
openai.api_version = "2022-12-01"
os.environ["OPENAI_API_KEY"] = "<your-openai-key>"
openai.api_key = os.getenv("OPENAI_API_KEY")

llm = AzureOpenAI(
   model="<foo-bar-model>",
   engine="<foo-bar-deployment>",
   temperature=0,
   api_key=openai.api.key,
   api_type=openai.api_type,
   api_base=openai.api_base,
   api_version=openai.api_version,
)

embedding_model = OpenAIEmbedding(
   model="text-embedding-ada-002",
   deployment_name="<foo-bar-deployment>",
   api_key=openai.api.key,
   api_base=openai.api.base,
   api_type=openai.api.type,
   api_version=openai.api.version,
)

Settings.llm = llm
Settings.chunk_size = 512
Settings.embed_model = embedding_model

Prepare NebulaGraph: Ensure you have a running NebulaGraph with the defined data schema:

os.environ["NEBULA_USER"] = "root"
os.environ["NEBULA_PASSWORD"] = "<password>"
os.environ["NEBULA_ADDRESS"] = "127.0.0.1:9669"

space_name = "paul_graham_essay"
edge_types, rel_prop_names = ["relationship"], ["relationship"]
tags = ["entity"]

Instantiate NebulaGraphStore:

from llama_index.core import StorageContext
from llama_index.graph_stores.nebula import NebulaGraphStore

graph_store = NebulaGraphStore(
   space_name=space_name,
   edge_types=edge_types,
   rel_prop_names=rel_prop_names,
   tags=tags,
)
storage_context = StorageContext.from_defaults(graph_store=graph_store)

Build the Knowledge Graph with LlamaIndex:

from llama_index.core import KnowledgeGraphIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader(
   "../../../../examples/paul_graham_essay/data"
).load_data()

index = KnowledgeGraphIndex.from_documents(
   documents,
   storage_context=storage_context,
   max_triplets_per_chunk=2,
   space_name=space_name,
   edge_types=edge_types,
   rel_prop_names=rel_prop_names,
   tags=tags,
)

Perform Graph RAG Query:

query_engine = index.as_query_engine()

response = query_engine.query("Tell me more about Interleaf")
from IPython.display import display, Markdown
display(Markdown(f"<b>{response}</b>"))

Visualize the Graph RAG:

%ngql --address 127.0.0.1 --port 9669 --user root --password <password>
%%ngql
USE paul_graham_essay;
MATCH p=(n)-[*1..2]-()
 WHERE id(n) IN ['Interleaf', 'history', 'Software', 'Company'] 
RETURN p LIMIT 100;
%ng_draw

By following these steps, you can leverage NebulaGraph to perform Retrieval-Augmented Generation (RAG) with your existing data. This process involves loading your data, indexing it, storing the index, querying it, and evaluating the results to ensure accuracy and relevance. This allows you to enhance the capabilities of your LLM applications by incorporating your specific data into the responses generated by the LLM ^[1]^[2]^[3]^[4].

To continue talking to Dosu, mention @dosu.

Chengyang852 commented 2 months ago

My data has been saved in NebulaGraph, not local data.

logan-markewich commented 2 months ago

Unless you constructed it with llama-index, the best you can do is text2cypher

Chengyang852 commented 2 months ago

Unless you constructed it with llama-index, the best you can do is text2cypher

I would like to ask a question about llama-index. At present, I have only found the KnowledgeGraphIndex.from_documents function in llama-index to implement it, but this function seems to extract local data entity relationship and store it in NebulaGraph. Now that I have put the extracted entities and relationships into NebulaGraph, what should I do next? Extract the data from NG and then input it into this function?

run-llama / llama_index

[Question]: NebulaGraph RAG #14649

Question Validation

Question