run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
33.38k stars 4.67k forks source link

[Bug]: Nebulagraph - upsert_triplet() function has empty string obj #14092

Open hungbui-io opened 3 weeks ago

hungbui-io commented 3 weeks ago

Bug Description

I am trying to generate a Nebulagraph knowledge graph from a code data in an xlsx file that contain function_name, file_name and function_definition columns. But an error occurs in the upsert_triplet() function saying that the created object vertex has empty string (pls check the log below)

Version

0.10.44

Steps to Reproduce

Relevant Logs/Tracbacks

DEBUG:llama_index.graph_stores.nebula.nebula_graph_store:upsert_triplet()
DML query: INSERT VERTEX `entity`(name)   VALUES "Generate_telemetry_data":("Generate_telemetry_data");INSERT VERTEX `entity`(name)   VALUES "Model_name":("Model_name");INSERT EDGE `relationship`(`relationship`)   VALUES "Generate_telemetry_data"->"Model_name"@1155094724010351908:("Has parameter");
upsert_triplet()
DML query: INSERT VERTEX `entity`(name)   VALUES "Generate_telemetry_data":("Generate_telemetry_data");INSERT VERTEX `entity`(name)   VALUES "Model_name":("Model_name");INSERT EDGE `relationship`(`relationship`)   VALUES "Generate_telemetry_data"->"Model_name"@1155094724010351908:("Has parameter");
test 111...
obj:  
subj:  Generate_telemetry_data
rel:  Has default value
************
Traceback (most recent call last):
  File "/home/aion/Hung/change_impact_analysis/CIA_nebulagraph_demo.py", line 447, in <module>
    _load_doc_from_excel()
  File "/home/aion/Hung/change_impact_analysis/CIA_nebulagraph_demo.py", line 367, in _load_doc_from_excel
    genGraph_from_doc(code_method_docs, 
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/aion/Hung/change_impact_analysis/CIA_nebulagraph_demo.py", line 131, in genGraph_from_doc
    kg_index = KnowledgeGraphIndex.from_documents(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/aion/anaconda3/envs/cia_/lib/python3.11/site-packages/llama_index/core/indices/base.py", line 145, in from_documents
    return cls(
           ^^^^
  File "/home/aion/anaconda3/envs/cia_/lib/python3.11/site-packages/llama_index/core/indices/knowledge_graph/base.py", line 99, in __init__
    super().__init__(
  File "/home/aion/anaconda3/envs/cia_/lib/python3.11/site-packages/llama_index/core/indices/base.py", line 94, in __init__
    index_struct = self.build_index_from_nodes(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/aion/anaconda3/envs/cia_/lib/python3.11/site-packages/llama_index/core/indices/base.py", line 212, in build_index_from_nodes
    return self._build_index_from_nodes(nodes)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/aion/anaconda3/envs/cia_/lib/python3.11/site-packages/llama_index/core/indices/knowledge_graph/base.py", line 211, in _build_index_from_nodes
    self.upsert_triplet(triplet)
  File "/home/aion/anaconda3/envs/cia_/lib/python3.11/site-packages/llama_index/core/indices/knowledge_graph/base.py", line 259, in upsert_triplet
    self._graph_store.upsert_triplet(*triplet)
  File "/home/aion/anaconda3/envs/cia_/lib/python3.11/site-packages/llama_index/graph_stores/nebula/nebula_graph_store.py", line 499, in upsert_triplet
    obj = escape_str(obj)
          ^^^^^^^^^^^^^^^
  File "/home/aion/anaconda3/envs/cia_/lib/python3.11/site-packages/llama_index/graph_stores/nebula/nebula_graph_store.py", line 103, in escape_str
    if value[0] == " " or value[-1] == " ":
       ~~~~~^^^
IndexError: string index out of range
dosubot[bot] commented 3 weeks ago

Hey @hungbui-io! I'm here to help you with any bugs, questions, or contributions. Let's tackle this issue together!

To resolve the error in the upsert_triplet() function, ensure that the obj argument is not an empty string. The error occurs because the obj value is empty, leading to an IndexError when attempting to access its first or last character.

Here's how you can modify your code to check for empty strings before calling upsert_triplet:

import pandas as pd
from llama_index.legacy.graph_stores.nebulagraph import NebulaGraphStore

# Load the xlsx file
df = pd.read_excel('path_to_your_file.xlsx')

# Initialize the NebulaGraphStore
nebula_store = NebulaGraphStore(
    space_name="your_space_name",
    edge_types=["relationship"],
    rel_prop_names=["defined_in"],
    tags=["entity"],
    tag_prop_names=["name"]
)

# Iterate through the rows of the dataframe and upsert triplets
for index, row in df.iterrows():
    function_name = row['function_name']
    file_name = row['file_name']
    function_definition = row['function_definition']

    # Ensure obj is not an empty string
    if function_name and file_name:
        nebula_store.upsert_triplet(function_name, 'defined_in', file_name)

This code checks if function_name and file_name are not empty before calling upsert_triplet. Adjust the condition based on your specific requirements to ensure that no empty strings are passed to the upsert_triplet method [1][2][3].

To continue talking to Dosu, mention @dosu.

logan-markewich commented 3 weeks ago

Hmm, guess it should filter empty entities/relations then