run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
35.26k stars 4.96k forks source link

[Question]: NebulaGraph RAG #14649

Open Chengyang852 opened 2 months ago

Chengyang852 commented 2 months ago

Question Validation

Question

I want to use NebulaGraph to do RAG. Now that all my data has been stored in NG, what should I do next? Should I query all the data and then do RAG? Can you give a demo?

dosubot[bot] commented 2 months ago

To use NebulaGraph for Retrieval-Augmented Generation (RAG) with your existing data, follow these steps:

  1. Install Required Packages:

    %pip install llama-index-llms-openai
    %pip install llama-index-embeddings-openai
    %pip install llama-index-graph-stores-nebula
    %pip install llama-index-llms-azure-openai
    %pip install nebula3-python
    %pip install ipython-ngql networkx pyvis
  2. Set Up OpenAI or Azure OpenAI: For OpenAI:

    import os
    from llama_index.llms.openai import OpenAI
    from llama_index.core import Settings
    
    os.environ["OPENAI_API_KEY"] = "INSERT OPENAI KEY"
    
    llm = OpenAI(temperature=0, model="gpt-3.5-turbo")
    Settings.llm = llm
    Settings.chunk_size = 512

    For Azure OpenAI:

    import os
    import openai
    from llama_index.llms.azure_openai import AzureOpenAI
    from llama_index.embeddings.openai import OpenAIEmbedding
    from llama_index.core import Settings
    
    openai.api_type = "azure"
    openai.api_base = "https://<foo-bar>.openai.azure.com"
    openai.api_version = "2022-12-01"
    os.environ["OPENAI_API_KEY"] = "<your-openai-key>"
    openai.api_key = os.getenv("OPENAI_API_KEY")
    
    llm = AzureOpenAI(
       model="<foo-bar-model>",
       engine="<foo-bar-deployment>",
       temperature=0,
       api_key=openai.api.key,
       api_type=openai.api_type,
       api_base=openai.api_base,
       api_version=openai.api_version,
    )
    
    embedding_model = OpenAIEmbedding(
       model="text-embedding-ada-002",
       deployment_name="<foo-bar-deployment>",
       api_key=openai.api.key,
       api_base=openai.api.base,
       api_type=openai.api.type,
       api_version=openai.api.version,
    )
    
    Settings.llm = llm
    Settings.chunk_size = 512
    Settings.embed_model = embedding_model
  3. Prepare NebulaGraph: Ensure you have a running NebulaGraph with the defined data schema:

    os.environ["NEBULA_USER"] = "root"
    os.environ["NEBULA_PASSWORD"] = "<password>"
    os.environ["NEBULA_ADDRESS"] = "127.0.0.1:9669"
    
    space_name = "paul_graham_essay"
    edge_types, rel_prop_names = ["relationship"], ["relationship"]
    tags = ["entity"]
  4. Instantiate NebulaGraphStore:

    from llama_index.core import StorageContext
    from llama_index.graph_stores.nebula import NebulaGraphStore
    
    graph_store = NebulaGraphStore(
       space_name=space_name,
       edge_types=edge_types,
       rel_prop_names=rel_prop_names,
       tags=tags,
    )
    storage_context = StorageContext.from_defaults(graph_store=graph_store)
  5. Build the Knowledge Graph with LlamaIndex:

    from llama_index.core import KnowledgeGraphIndex, SimpleDirectoryReader
    
    documents = SimpleDirectoryReader(
       "../../../../examples/paul_graham_essay/data"
    ).load_data()
    
    index = KnowledgeGraphIndex.from_documents(
       documents,
       storage_context=storage_context,
       max_triplets_per_chunk=2,
       space_name=space_name,
       edge_types=edge_types,
       rel_prop_names=rel_prop_names,
       tags=tags,
    )
  6. Perform Graph RAG Query:

    query_engine = index.as_query_engine()
    
    response = query_engine.query("Tell me more about Interleaf")
    from IPython.display import display, Markdown
    display(Markdown(f"<b>{response}</b>"))
  7. Visualize the Graph RAG:

    %ngql --address 127.0.0.1 --port 9669 --user root --password <password>
    %%ngql
    USE paul_graham_essay;
    MATCH p=(n)-[*1..2]-()
     WHERE id(n) IN ['Interleaf', 'history', 'Software', 'Company'] 
    RETURN p LIMIT 100;
    %ng_draw

By following these steps, you can leverage NebulaGraph to perform Retrieval-Augmented Generation (RAG) with your existing data. This process involves loading your data, indexing it, storing the index, querying it, and evaluating the results to ensure accuracy and relevance. This allows you to enhance the capabilities of your LLM applications by incorporating your specific data into the responses generated by the LLM [1][2][3][4].

To continue talking to Dosu, mention @dosu.

Chengyang852 commented 2 months ago

My data has been saved in NebulaGraph, not local data.

logan-markewich commented 2 months ago

Unless you constructed it with llama-index, the best you can do is text2cypher

Chengyang852 commented 2 months ago

Unless you constructed it with llama-index, the best you can do is text2cypher

I would like to ask a question about llama-index. At present, I have only found the KnowledgeGraphIndex.from_documents function in llama-index to implement it, but this function seems to extract local data entity relationship and store it in NebulaGraph. Now that I have put the extracted entities and relationships into NebulaGraph, what should I do next? Extract the data from NG and then input it into this function?