kuzudb / kuzu

Embeddable property graph database management system built for query speed and scalability. Implements Cypher.
https://kuzudb.com/
MIT License
1.38k stars 96 forks source link

Bug: LlamaIndex integration not working as expected (AttributeError: 'KuzuPropertyGraphStore' object has no attribute 'upsert_triplet') #4440

Open jjccooooll12 opened 1 day ago

jjccooooll12 commented 1 day ago

Kùzu version

v.0.6.1

What operating system are you using?

Windows 10

What happened?

Piece of code, from tutorial https://docs.llamaindex.ai/en/stable/examples/index_structs/knowledge_graph/KuzuGraphDemo/:

index = KnowledgeGraphIndex.from_documents(
    documents,
    max_triplets_per_chunk=30,
    storage_context=storage_context,
)

produces error: AttributeError: 'KuzuPropertyGraphStore' object has no attribute 'upsert_triplet'

Full error track:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[14], line 2
      1 # NOTE: can take a while!
----> 2 index = KnowledgeGraphIndex.from_documents(
      3     documents,
      4     max_triplets_per_chunk=30,
      5     storage_context=storage_context,
      6 )
      7 # # To reload from an existing graph store without recomputing each time, use:
      8 # index = KnowledgeGraphIndex(nodes=[], storage_context=storage_context)

File ~\.conda\envs\graphrag\lib\site-packages\llama_index\core\indices\base.py:119, in BaseIndex.from_documents(cls, documents, storage_context, show_progress, callback_manager, transformations, **kwargs)
    110     docstore.set_document_hash(doc.get_doc_id(), doc.hash)
    112 nodes = run_transformations(
    113     documents,  # type: ignore
    114     transformations,
    115     show_progress=show_progress,
    116     **kwargs,
    117 )
--> 119 return cls(
    120     nodes=nodes,
    121     storage_context=storage_context,
    122     callback_manager=callback_manager,
    123     show_progress=show_progress,
    124     transformations=transformations,
    125     **kwargs,
    126 )

File ~\.conda\envs\graphrag\lib\site-packages\llama_index\core\indices\knowledge_graph\base.py:99, in KnowledgeGraphIndex.__init__(self, nodes, objects, index_struct, llm, embed_model, storage_context, kg_triplet_extract_template, max_triplets_per_chunk, include_embeddings, show_progress, max_object_length, kg_triplet_extract_fn, **kwargs)
     96 self._llm = llm or Settings.llm
     97 self._embed_model = embed_model or Settings.embed_model
---> 99 super().__init__(
    100     nodes=nodes,
    101     index_struct=index_struct,
    102     storage_context=storage_context,
    103     show_progress=show_progress,
    104     objects=objects,
    105     **kwargs,
    106 )
    108 # TODO: legacy conversion - remove in next release
    109 if (
    110     len(self.index_struct.table) > 0
    111     and isinstance(self.graph_store, SimpleGraphStore)
    112     and len(self.graph_store._data.graph_dict) == 0
    113 ):

File ~\.conda\envs\graphrag\lib\site-packages\llama_index\core\indices\base.py:77, in BaseIndex.__init__(self, nodes, objects, index_struct, storage_context, callback_manager, transformations, show_progress, **kwargs)
     75 if index_struct is None:
     76     nodes = nodes or []
---> 77     index_struct = self.build_index_from_nodes(
     78         nodes + objects,  # type: ignore
     79         **kwargs,  # type: ignore
     80     )
     81 self._index_struct = index_struct
     82 self._storage_context.index_store.add_index_struct(self._index_struct)

File ~\.conda\envs\graphrag\lib\site-packages\llama_index\core\indices\base.py:185, in BaseIndex.build_index_from_nodes(self, nodes, **build_kwargs)
    183 """Build the index from nodes."""
    184 self._docstore.add_documents(nodes, allow_update=True)
--> 185 return self._build_index_from_nodes(nodes, **build_kwargs)

File ~\.conda\envs\graphrag\lib\site-packages\llama_index\core\indices\knowledge_graph\base.py:218, in KnowledgeGraphIndex._build_index_from_nodes(self, nodes, **build_kwargs)
    216 for triplet in triplets:
    217     subj, _, obj = triplet
--> 218     self.upsert_triplet(triplet)
    219     index_struct.add_node([subj, obj], n)
    221 if self.include_embeddings:

File ~\.conda\envs\graphrag\lib\site-packages\llama_index\core\indices\knowledge_graph\base.py:266, in KnowledgeGraphIndex.upsert_triplet(self, triplet, include_embeddings)
    254 def upsert_triplet(
    255     self, triplet: Tuple[str, str, str], include_embeddings: bool = False
    256 ) -> None:
    257     """Insert triplets and optionally embeddings.
    258 
    259     Used for manual insertion of KG triplets (in the form
   (...)
    264         embedding (Any, optional): Embedding option for the triplet. Defaults to None.
    265     """
--> 266     self._graph_store.upsert_triplet(*triplet)
    267     triplet_str = str(triplet)
    268     if include_embeddings:

AttributeError: 'KuzuPropertyGraphStore' object has no attribute 'upsert_triplet'

Instead, it should populate automatically the Kuzu graph db.

Are there known steps to reproduce?

No response

prrao87 commented 1 day ago

Hi @jjccooooll12 thanks for trying out Kùzu! We are not planning on maintaining the KnowledgeGraphIndex going forward as this has largely been deprecated and replaced by the PropertyGraphIndex API in LlamaIndex. Even the core LlamaIndex library maintainers won't be maintaining KnowledgeGraphIndex as its functionality is superceded by the much more capable PropertyGraphIndex. The only reason they still keep the documentation for this on their website is for legacy reasons and they encourage folks to not use it any more.

I'd recommend taking a look at this example notebook in our docs: https://colab.research.google.com/drive/1brAdNRNLG2XHD7Jv3ZwSQCOCJ_wmfd1B

You would then import PropertyGraphIndex from llama_index.core and upsert triplets via your desired schema using the .from_documents(...) method of thePropertyGraphIndex` class instead.

from llama_index.core import PropertyGraphIndex
from llama_index.core.indices.property_graph import SchemaLLMPathExtractor

index = PropertyGraphIndex.from_documents(
    documents,
    embed_model=embed_model,
    kg_extractors=[SchemaLLMPathExtractor(extract_llm)],
    property_graph_store=graph_store,
    show_progress=True,
)

Hope this works!