run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
35.26k stars 4.96k forks source link

[Question]: NebulaGraphStore #15043

Open sammed-kamboj opened 1 month ago

sammed-kamboj commented 1 month ago

Question Validation

Question

I want create a NebulaGraphStore and insert nodes manually. So far i am doing on google colab

!pip install llama-index !pip install llama-index-embeddings-huggingface !pip install llama-index-graph-stores-nebula !pip install nebulagraph-lite !pip install jupyter_nebulagraph

import nest_asyncio nest_asyncio.apply()

from nebulagraph_lite import nebulagraph_let as ng_let n = ng_let() n.start()

%load_ext ngql

connect to NebulaGraph service

%ngql --address 127.0.0.1 --port 9669 --user root --password nebula %ngql CREATE SPACE IF NOT EXISTS llamaindex_nebula_property_graph(vid_type=FIXED_STRING(256));

Use the space

%ngql USE llamaindex_nebula_property_graph;

Define the schema for the vertices

%ngql CREATE TAG IF NOT EXISTS PERSON(name string, age int); %ngql CREATE TAG IF NOT EXISTS ORGANIZATION(name string); %ngql CREATE EDGE IF NOT EXISTS WORKS_FOR(since int);

graph_store = NebulaPropertyGraphStore( space="llamaindex_nebula_property_graph", overwrite=True )

Create a two entity nodes

entity1 = EntityNode(label="PERSON", name="Logan", properties={"age": 28}) entity2 = EntityNode(label="ORGANIZATION", name="LlamaIndex")

Create a relation

relation = Relation( label="WORKS_FOR", source_id=entity1.id, target_id=entity2.id, properties={"since": 2023}, )

graph_store.upsert_nodes([entity1, entity2]) graph_store.upsert_relations([relation])

but it is throwing me this error:

----> 1 graph_store.upsert_nodes([entity1, entity2])
      2 graph_store.upsert_relations([relation])

1 frames
[/usr/local/lib/python3.10/dist-packages/llama_index/graph_stores/nebula/nebula_property_graph.py](https://localhost:8080/#) in upsert_nodes(self, nodes)
    286             )
    287             stmt = f'INSERT VERTEX Props__ ({keys}) VALUES "{entity.id}":({values_k});'
--> 288             self.structured_query(
    289                 stmt,
    290                 param_map=values_params,

[/usr/local/lib/python3.10/dist-packages/llama_index/graph_stores/nebula/nebula_property_graph.py](https://localhost:8080/#) in structured_query(self, query, param_map)
    548             result = self._client.execute_parameter(query, build_param_map(param_map))
    549         if not result.is_succeeded():
--> 550             raise Exception(
    551                 "NebulaGraph query failed:",
    552                 result.error_msg(),

Exception: ('NebulaGraph query failed:', "SemanticError: Unknown column `age' in schema", 'Statement:', 'INSERT VERTEX Props__ (`age`) VALUES "Logan":($kv_0);', 'Params:', {'kv_0': 28})

can someone guide me what an I doing wrong?

dosubot[bot] commented 1 month ago

Hey @sammed-kamboj! 👋 I'm here to help you out with any bugs, questions, or contributions. Let's squash this bug together!

The error you're encountering, SemanticError: Unknown column 'age' in schema, indicates that the age property is not recognized in the schema for the Props__ tag. This is likely because the Props__ tag was not created with the age property.

To resolve this, you need to ensure that the Props__ tag includes the age property in its schema. Here’s how you can modify your code to include the age property in the Props__ tag:

  1. Update the Schema Definition: Ensure that the Props__ tag includes the age property when you create the schema.

    %ngql CREATE TAG IF NOT EXISTS Props__(age int);
  2. Initialize the NebulaPropertyGraphStore: Make sure the props_schema includes the age property.

    from llama_index.graph_stores.nebula.nebula_property_graph import NebulaPropertyGraphStore
    
    graph_store = NebulaPropertyGraphStore(
       space="llamaindex_nebula_property_graph",
       overwrite=True,
       props_schema="`age` int"
    )
  3. Insert Nodes and Relations: Your existing code for creating and inserting nodes and relations should work correctly once the schema is properly defined.

    from llama_index.core.graph_stores.types import EntityNode, Relation
    
    # Create entity nodes
    entity1 = EntityNode(label="PERSON", name="Logan", properties={"age": 28})
    entity2 = EntityNode(label="ORGANIZATION", name="LlamaIndex")
    
    # Create a relation
    relation = Relation(
       label="WORKS_FOR",
       source_id=entity1.id,
       target_id=entity2.id,
       properties={"since": 2023},
    )
    
    # Insert nodes and relations
    graph_store.upsert_nodes([entity1, entity2])
    graph_store.upsert_relations([relation])

By ensuring that the Props__ tag includes the age property in its schema, the upsert_nodes method should be able to insert the nodes without encountering the SemanticError [1][2].

To continue talking to Dosu, mention @dosu.

sammed-kamboj commented 1 month ago

@dosu the Props__ tag is defined in source code, I dont think I should edit it or define it in schema

dosubot[bot] commented 1 month ago

To resolve the issue without modifying the Props__ tag directly, you should include the age property in the tag_prop_names parameter when initializing the NebulaGraphStore. This will ensure that the age column is recognized in the schema.

Here's how you can modify your code:

  1. Update the tag_prop_names Parameter: Include the age property in the tag_prop_names parameter.

    tag_prop_names = ["name,age"]
    
    graph_store = NebulaPropertyGraphStore(
       space="llamaindex_nebula_property_graph",
       tag_prop_names=tag_prop_names,
       include_vid=True,
       overwrite=True
    )
  2. Insert Nodes and Relations: Your existing code for creating and inserting nodes and relations should work correctly once the schema is properly defined.

    from llama_index.core.graph_stores.types import EntityNode, Relation
    
    # Create entity nodes
    entity1 = EntityNode(label="PERSON", name="Logan", properties={"age": 28})
    entity2 = EntityNode(label="ORGANIZATION", name="LlamaIndex")
    
    # Create a relation
    relation = Relation(
       label="WORKS_FOR",
       source_id=entity1.id,
       target_id=entity2.id,
       properties={"since": 2023},
    )
    
    # Insert nodes and relations
    graph_store.upsert_nodes([entity1, entity2])
    graph_store.upsert_relations([relation])

By including the age property in the tag_prop_names parameter, the upsert_nodes method should be able to insert the nodes without encountering the "Unknown column age in schema" error [1][2].

To continue talking to Dosu, mention @dosu.

logan-markewich commented 1 month ago

@sammed-kamboj dosu is almost right, but I think it got confused by another graph store

You'll want to modify the prop schema. Here's the default

DEFAULT_PROPS_SCHEMA = "`file_path` STRING, `file_name` STRING, `file_type` STRING, `file_size` INT, `creation_date` STRING, `last_modified_date` STRING, `_node_content` STRING, `_node_type` STRING, `document_id` STRING, `doc_id` STRING, `ref_doc_id` STRING, `triplet_source_id` STRING"

graph_store = NebulaPropertyGraphStore(..., props_schema=DEFAULT_PROPS_SCHEMA)
sammed-kamboj commented 1 month ago

@logan-markewich Thank you for you reply. I am a bit new to graphs. As per my understanding, I have to change DEFAULT_PROPS_SCHEMA by adding age in it?

Something like

DEFAULT_PROPS_SCHEMA = ... whatever there is right now... , `age` INT, `name` STRING"

And then pass it through graph store as

graph_store = NebulaPropertyGraphStore(..., props_schema=DEFAULT_PROPS_SCHEMA) ?