run-llama / llama_index

LlamaIndex is a data framework for your LLM applications

https://docs.llamaindex.ai

MIT License

35.22k stars 4.94k forks source link

[Question]: The created knowledge graph does not have edge relationships neo4j #14763

Open nuocheng opened 1 month ago

nuocheng commented 1 month ago

Question Validation

[ ] I have searched both the documentation and discord for an answer.

Question

Use the qwen large model deployed locally in VLLM and the API service launched by OpenAI
The data used ishttps://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt

Code


from llama_index.core import SimpleDirectoryReader
documents = SimpleDirectoryReader("./paul_graham/").load_data()

import nest_asyncio

nest_asyncio.apply()

from typing import Literal from llama_index.core.indices.property_graph import SchemaLLMPathExtractor from llama_index.llms.openai_like import OpenAILike

best practice to use upper-case

entities = Literal["PERSON", "PLACE", "ORGANIZATION"] relations = Literal["HAS", "PART_OF", "WORKED_ON", "WORKED_WITH", "WORKED_AT"]

define which entities can have which relations

validation_schema = { "PERSON": ["HAS", "PART_OF", "WORKED_ON", "WORKED_WITH", "WORKED_AT"], "PLACE": ["HAS", "PART_OF", "WORKED_AT"], "ORGANIZATION": ["HAS", "PART_OF", "WORKED_WITH"], }

kg_extractor = SchemaLLMPathExtractor( llm = OpenAILike( model="Qwen2-72B-Instruct", api_key="fake", api_base="http://ip:18090/v1",
is_chat_model=True), possible_entities=entities, possible_relations=relations, kg_validation_schema=validation_schema, strict=True, )

from llama_index.graph_stores.neo4j import Neo4jPGStore

from llama_index.legacy.graph_stores.neo4j import Neo4jGraphStore

graph_store = Neo4jPGStore( username="neo4j", password="password", url="neo4j://ip:7687" )

from llama_index.core import PropertyGraphIndex from llama_index.embeddings.huggingface import HuggingFaceEmbedding embedding_path = "~/embedding_model/multilingual-e5-large-instruct" index = PropertyGraphIndex.from_documents( documents, kg_extractors=[kg_extractor], embed_model=HuggingFaceEmbedding(model_name=embedding_path), property_graph_store=graph_store, show_progress=True )


I am using Llamaindex to build a knowledge graph and have defined `entities`, `relations`, and `validation_stemplates`, but the generated knowledge graph does not have edge relationships

dosubot[bot] commented 1 month ago

To resolve the issue of the generated knowledge graph not having edge relationships, ensure that the _prune_invalid_triplets method is correctly validating and adding the triplets. Here is a snippet of the relevant code that handles this:

def _prune_invalid_triplets(self, kg_schema: Any) -> List[Triplet]:
    """Prune invalid triplets."""
    assert isinstance(kg_schema, self.kg_schema_cls)

    valid_triplets = []
    for triplet in kg_schema.triplets:
        subject = triplet.subject.name
        subject_type = triplet.subject.type

        relation = triplet.relation.type
        obj = triplet.object.name
        obj_type = triplet.object.type

        # Check if the triplet is valid based on the schema format
        if (
            isinstance(self.kg_validation_schema, dict)
            and "relationships" in self.kg_validation_schema
        ):
            # Schema is a dictionary with a 'relationships' key and triples as values
            if (subject_type, relation, obj_type) not in self.kg_validation_schema[
                "relationships"
            ]:
                continue
        else:
            # Schema is the backwards-compat format
            if relation not in self.kg_validation_schema.get(
                subject_type, [relation]
            ) and relation not in self.kg_validation_schema.get(
                obj_type, [relation]
            ):
                continue

        # Remove self-references
        if subject.lower() == obj.lower():
            continue

        subj_node = EntityNode(label=subject_type, name=subject)
        obj_node = EntityNode(label=obj_type, name=obj)
        rel_node = Relation(
            label=relation, source_id=subj_node.id, target_id=obj_node.id
        )
        valid_triplets.append((subj_node, rel_node, obj_node))

    return valid_triplets

Additionally, ensure that the SchemaLLMPathExtractor is being used to extract edges from node relationships. Here is the relevant code snippet:

from typing import Literal
from llama_index.llms.ollama import Ollama
from llama_index.core.indices.property_graph import SchemaLLMPathExtractor

entities = Literal["PERSON", "PLACE", "ORGANIZATION"]
relations = Literal["HAS", "PART_OF", "WORKED_ON", "WORKED_WITH", "WORKED_AT"]

validation_schema = [
    ("ORGANIZATION", "HAS", "PERSON"),
    ("PERSON", "WORKED_AT", "ORGANIZATION"),
    ("PERSON", "WORKED_WITH", "PERSON"),
    ("PERSON", "WORKED_ON", "ORGANIZATION"),
    ("PERSON", "PART_OF", "ORGANIZATION"),
    ("ORGANIZATION", "PART_OF", "ORGANIZATION"),
    ("PERSON", "WORKED_AT", "PLACE"),
]

kg_extractor = SchemaLLMPathExtractor(
    llm=Ollama(model="llama3", json_mode=True, request_timeout=3600),
    possible_entities=entities,
    possible_relations=relations,
    kg_validation_schema=validation_schema,
    strict=True,
)

Ensure that:

The kg_validation_schema is correctly defined and includes the relationships you expect.
The entities and relations extracted from the text match the types defined in your schema.
The _prune_invalid_triplets method is correctly filtering and adding valid triplets to the knowledge graph.
The SchemaLLMPathExtractor is being used to extract and add edges from node relationships.

If the issue persists, you may need to debug the extraction process to ensure that the entities and relations are being correctly identified and validated ^[1]^[2]^[3].

To continue talking to Dosu, mention @dosu.

nuocheng commented 1 month ago

@dosu It's not that problem. I started the opanai interface using Fastchat and didn't build it according to the entity and edge relationships I provided. Can LLM use the OpenAILike method to load the local model

dosubot[bot] commented 1 month ago

Yes, LLM can use the OpenAILike method to load the local model. The LocalAI class is designed to be a self-hosted alternative to OpenAI and can be used with OpenAILike by providing the necessary parameters such as api_key and api_base.

Here is an example of how you might initialize the LocalAI class using OpenAILike:

from llama_index.legacy.llms.openai_like import OpenAILike
from llama_index.legacy.llms.localai import LocalAI

# Initialize LocalAI with default parameters
local_ai = LocalAI()

# Use OpenAILike to load the local model
openai_like_model = OpenAILike(
    api_key=local_ai.api_key,
    api_base=local_ai.api_base,
    # Add other necessary parameters here
)

To ensure that the knowledge graph is built according to the entity and edge relationships you provided, you need to use the appropriate methods and data structures within the LlamaIndex framework to define and construct the knowledge graph. Specifically, you should ensure that the SchemaLLMPathExtractor is correctly configured with your kg_validation_schema and that the relationships are being extracted and validated properly.

For example:

from llama_index.core.indices.property_graph import SchemaLLMPathExtractor

entities = Literal["PERSON", "PLACE", "ORGANIZATION"]
relations = Literal["HAS", "PART_OF", "WORKED_ON", "WORKED_WITH", "WORKED_AT"]

validation_schema = {
    "PERSON": ["HAS", "PART_OF", "WORKED_ON", "WORKED_WITH", "WORKED_AT"],
    "PLACE": ["HAS", "PART_OF", "WORKED_AT"],
    "ORGANIZATION": ["HAS", "PART_OF", "WORKED_WITH"],
}

kg_extractor = SchemaLLMPathExtractor(
    llm=openai_like_model,
    possible_entities=entities,
    possible_relations=relations,
    kg_validation_schema=validation_schema,
    strict=True,
)

Ensure that the kg_validation_schema is correctly defined and includes the relationships you expect. This will help in building the knowledge graph according to the entity and edge relationships you provided ^[1].

To continue talking to Dosu, mention @dosu.

nuocheng commented 1 month ago

@dosu The result of the constructed knowledge graph

Displaying 22 nodes, 0 relationships

dosubot[bot] commented 1 month ago

To address the issue of the constructed knowledge graph displaying nodes but no relationships, ensure the following:

Validation Schema: Verify that your kg_validation_schema is correctly defined and includes the relationships you expect. The schema should match the entities and relationships extracted from the text.
Triplet Extraction: Ensure that the SchemaLLMPathExtractor is correctly extracting triplets and that the _prune_invalid_triplets method is validating and adding these triplets properly.
Logging and Debugging: Use logging and debugging tools to trace the relationship extraction process. Enable debug logging and use the LlamaDebugHandler to track events and identify where the relationships might be failing to be detected.

Here is an example of how to enable logging and use the LlamaDebugHandler:

import logging
import sys
from llama_index.legacy.callbacks.llama_debug import LlamaDebugHandler
import llama_index.core

# Enable debug logging
logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

# Set up the debug handler
debug_handler = LlamaDebugHandler()
llama_index.core.set_global_handler(debug_handler)

Edge Definition: Ensure that edges are being defined and added correctly in your graph structure. Use the fb function to create edge objects with attributes and add them to the knowledge graph.

Example of defining and adding edges:

function fb(t, e, i, n, o) {
  var r = { from: e, to: i, type: n };
  return (
    t.edge && (r.attr = lb({}, t.edge)),
    (r.attr = lb(r.attr || {}, o)),
    null != o &&
      o.hasOwnProperty("arrows") &&
      null != o.arrows &&
      ((r.arrows = { to: { enabled: !0, type: o.arrows.type } }),
      (o.arrows = null)),
    r
  );
}

// Example usage:
var graph = {
  nodes: [],
  edges: []
};

// Define nodes
graph.nodes.push({ id: 1, label: 'Node 1' });
graph.nodes.push({ id: 2, label: 'Node 2' });

// Define edge attributes
var edgeAttributes = {
  arrows: { type: 'arrow' },
  color: 'blue'
};

// Add edge using the fb function
var edge = fb(graph, 1, 2, 'relationshipType', edgeAttributes);
graph.edges.push(edge);

console.log(graph);

Ensure that the SchemaLLMPathExtractor is being used to extract and add edges from node relationships, and that the relationships are being correctly identified and validated ^[1]^[2]^[3].

To continue talking to Dosu, mention @dosu.

nuocheng commented 1 month ago

@dosu

Can the Olama method use local models and how to use them.

kg_extractor = SchemaLLMPathExtractor( llm=Ollama(model="llama3", json_mode=True, request_timeout=3600), possible_entities=entities, possible_relations=relations, kg_validation_schema=validation_schema, strict=True, )

dosubot[bot] commented 1 month ago

Yes, the Ollama method can use local models. To configure it properly in your knowledge graph construction, follow these steps:

Install Ollama: Use the one-click installer for your operating system from the Ollama home page.
Download the Llama3 model: Use the following command:
```
ollama pull llama3
```
Switch to local agent: Install the Ollama integration with:
```
pip install llama-index-llms-ollama
```
Modify your dependencies: Import Ollama instead of OpenAI:
```
from llama_index.llms.ollama import Ollama
```

Initialize the model: Set up Llama3 as your LLM:

llm = Ollama(model="llama3", request_timeout=360.0)

Construct your knowledge graph: Use the SchemaLLMPathExtractor to construct your graph with predefined schemas:

from typing import Literal
from llama_index.llms.ollama import Ollama
from llama_index.core.indices.property_graph import SchemaLLMPathExtractor

entities = Literal["PERSON", "PLACE", "ORGANIZATION"]
relations = Literal["HAS", "PART_OF", "WORKED_ON", "WORKED_WITH", "WORKED_AT"]

validation_schema = [
    ("ORGANIZATION", "HAS", "PERSON"),
    ("PERSON", "WORKED_AT", "ORGANIZATION"),
    ("PERSON", "WORKED_WITH", "PERSON"),
    ("PERSON", "WORKED_ON", "ORGANIZATION"),
    ("PERSON", "PART_OF", "ORGANIZATION"),
    ("ORGANIZATION", "PART_OF", "ORGANIZATION"),
    ("PERSON", "WORKED_AT", "PLACE"),
]

kg_extractor = SchemaLLMPathExtractor(
    llm=Ollama(model="llama3", json_mode=True, request_timeout=3600),
    possible_entities=entities,
    possible_relations=relations,
    kg_validation_schema=validation_schema,
    strict=True,
)

Load your data: For example, download and load sample data:

from llama_index.core import SimpleDirectoryReader

documents = SimpleDirectoryReader("./data/paul_graham/").load_data()

Create the property graph: Use the PropertyGraphIndex to create your graph:

from llama_index.core import PropertyGraphIndex
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

index = PropertyGraphIndex.from_documents(
    documents,
    kg_extractors=[kg_extractor],
    embed_model=HuggingFaceEmbedding(model_name="BAAI/bge-base-en-v1.5"),
    show_progress=True,
)

By following these steps, you can configure and use local models with the Ollama method in your knowledge graph construction ^[1]^[2]^[3]^[4]^[5].

To continue talking to Dosu, mention @dosu.

nuocheng commented 1 month ago

@dosu I have encountered a problem

Received notification from DBMS server: {severity: WARNING} {code: Neo.ClientNotification.Statement.FeatureDeprecationWarning} {category: DEPRECATION} {title: This feature is deprecated and will be removed in future versions.} {description: The procedure has a deprecated field. ('config' used by 'apoc.meta.graphSample' is deprecated.)} {position: line: 1, column: 1, offset: 0} for query: "CALL apoc.meta.graphSample() YIELD nodes, relationships RETURN nodes, [rel in relationships | {name:apoc.any.property(rel, 'type'), count: apoc.any.property(rel, 'count')}] AS relationships"

dosubot[bot] commented 1 month ago

To resolve the deprecation warning related to the 'config' field used by apoc.meta.graphSample in your Neo4j and LlamaIndex setup, you should update the query to remove the deprecated 'config' field. Here is the updated code snippet:

Replace:

schema_counts = self.structured_query(
    "CALL apoc.meta.graphSample() YIELD nodes, relationships "
    "RETURN nodes, [rel in relationships | {name:apoc.any.property"
    "(rel, 'type'), count: apoc.any.property(rel, 'count')}]"
    " AS relationships"
)

With:

schema_counts = self.structured_query(
    "CALL apoc.meta.graphSample() YIELD nodes, relationships "
    "RETURN nodes, [rel in relationships | {name: type(rel), count: rel.count}]"
    " AS relationships"
)

This change ensures compatibility with future versions of Neo4j by removing the deprecated 'config' field ^[1].

To continue talking to Dosu, mention @dosu.

KAI10 commented 1 month ago

Hi @nuocheng, were you able to resolve the issue? I am also getting 22 nodes and no edges as you mentioned above. Thanks!

nuocheng commented 1 month ago

@KAI10 Until now, no solution has been found. I found during the testing process that when using Schema LLMPath Extractor, GPU computing resources were not utilized. I also look forward to someone solving this problem.

nuocheng commented 1 month ago

@dosu Does the SchemaLLMPathExtractor method support OpenAI and how should it be used

dosubot[bot] commented 1 month ago

Yes, the SchemaLLMPathExtractor method supports OpenAI and can be used to build a knowledge graph with both nodes and relationships. Here are the steps to ensure it is used correctly:

Initialize the SchemaLLMPathExtractor with the appropriate parameters:
- llm: This should be an instance of the OpenAI language model.
- extract_prompt: Use the default or a custom prompt for extraction.
- kg_schema_cls: The knowledge graph schema class.
- kg_validation_schema: The validation schema for the knowledge graph.
- num_workers, max_triplets_per_chunk, and strict: Other optional parameters.
Call the SchemaLLMPathExtractor with a list of nodes:
- The __call__ method will extract triplets from the nodes and return a list of nodes with the extracted triplets.

Here is an example code snippet demonstrating how to use the SchemaLLMPathExtractor:

from llama_index.core.indices.property_graph.transformations.schema_llm import SchemaLLMPathExtractor
from some_module import OpenAILanguageModel, BaseNode  # Replace with actual imports

# Initialize the OpenAI language model
llm = OpenAILanguageModel(api_key="your_openai_api_key")

# Initialize the SchemaLLMPathExtractor
extractor = SchemaLLMPathExtractor(
    llm=llm,
    extract_prompt="Your custom or default prompt",
    kg_schema_cls=YourKGSchemaClass,  # Replace with your KG schema class
    kg_validation_schema=YourValidationSchema,  # Replace with your validation schema
    num_workers=4,
    max_triplets_per_chunk=10,
    strict=True
)

# List of nodes to extract triplets from
nodes = [BaseNode(text="Your node text here"), ...]

# Extract triplets
extracted_nodes = extractor(nodes, show_progress=True)

# The extracted_nodes now contain the knowledge graph with nodes and relationships

This code ensures that the SchemaLLMPathExtractor method supports OpenAI and is used correctly to build a knowledge graph with both nodes and relationships ^[1].

To continue talking to Dosu, mention @dosu.

nuocheng commented 1 month ago

@KAI10 I use the API of OpenAI to build node and edge relationships normally, but the locally launched model cannot be built.