run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
35.36k stars 4.98k forks source link

[Bug]: KnowledgeGraphIndex._build_index_from_nodes() got an unexpected keyword argument 'space_name' #14748

Open NasonZ opened 2 months ago

NasonZ commented 2 months ago

Bug Description

I'm trying to run this NebulaGraph example.

Running this cell:

from llama_index.core import KnowledgeGraphIndex

kg_index = KnowledgeGraphIndex.from_documents(
    documents,
    storage_context=storage_context,
    max_triplets_per_chunk=10,
    space_name=space_name,
    edge_types=edge_types,
    rel_prop_names=rel_prop_names,
    tags=tags,
    include_embeddings=True,
)

I get this error:

File c:\Users... \lib\site-packages\llama_index\core\indices\base.py:216, in BaseIndex.build_index_from_nodes(self, nodes, **build_kwargs)
214 """Build the index from nodes."""
215 self._docstore.add_documents(nodes, allow_update=True)
--> 216 return self._build_index_from_nodes(nodes, **build_kwargs)

TypeError: KnowledgeGraphIndex._build_index_from_nodes() got an unexpected keyword argument 'space_name'

Please advise on how I can resolve this?

Version

0.10.55

Steps to Reproduce

Run the notebook

Relevant Logs/Tracbacks

TypeError Traceback (most recent call last)
Cell In[7], line 3
1 from llama_index.core import KnowledgeGraphIndex
----> 3 kg_index = KnowledgeGraphIndex.from_documents(
4 documents,
5 storage_context=storage_context,
6 max_triplets_per_chunk=10,
7 space_name=space_name,
8 edge_types=edge_types,
9 rel_prop_names=rel_prop_names,
10 tags=tags,
11 include_embeddings=True,
12 )

File c:\Users... \lib\site-packages\llama_index\core\indices\base.py:145, in BaseIndex.from_documents(cls, documents, storage_context, show_progress, callback_manager, transformations, service_context, **kwargs)
136 docstore.set_document_hash(doc.get_doc_id(), doc.hash)
138 nodes = run_transformations(
139 documents, # type: ignore
140 transformations,
141 show_progress=show_progress,
142 **kwargs,
143 )
--> 145 return cls(
146 nodes=nodes,
147 storage_context=storage_context,
148 callback_manager=callback_manager,
149 show_progress=show_progress,
150 transformations=transformations,
151 service_context=service_context,
152 **kwargs,
153 )

File c:\Users... \lib\site-packages\llama_index\core\indices\knowledge_graph\base.py:109, in KnowledgeGraphIndex.init(self, nodes, objects, index_struct, llm, embed_model, storage_context, kg_triplet_extract_template, max_triplets_per_chunk, include_embeddings, show_progress, max_object_length, kg_triplet_extract_fn, service_context, **kwargs)
104 self._llm = llm or llm_from_settings_or_context(Settings, service_context)
105 self._embed_model = embed_model or embed_model_from_settings_or_context(
106 Settings, service_context
107 )
--> 109 super().init(
110 nodes=nodes,
111 index_struct=index_struct,
112 service_context=service_context,
113 storage_context=storage_context,
114 show_progress=show_progress,
115 objects=objects,
116 **kwargs,
117 )
119 # TODO: legacy conversion - remove in next release
120 if (
121 len(self.index_struct.table) > 0
122 and isinstance(self.graph_store, SimpleGraphStore)
123 and len(self.graph_store._data.graph_dict) == 0
124 ):

File c:\Users... \lib\site-packages\llama_index\core\indices\base.py:94, in BaseIndex.init(self, nodes, objects, index_struct, storage_context, callback_manager, transformations, show_progress, service_context, **kwargs)
92 if index_struct is None:
93 nodes = nodes or []
---> 94 index_struct = self.build_index_from_nodes(
95 nodes + objects, **kwargs # type: ignore
96 )
97 self._index_struct = index_struct
98 self._storage_context.index_store.add_index_struct(self._index_struct)

File c:\Users... \lib\site-packages\llama_index\core\indices\base.py:216, in BaseIndex.build_index_from_nodes(self, nodes, **build_kwargs)
214 """Build the index from nodes."""
215 self._docstore.add_documents(nodes, allow_update=True)
--> 216 return self._build_index_from_nodes(nodes, **build_kwargs)

TypeError: KnowledgeGraphIndex._build_index_from_nodes() got an unexpected keyword argument 'space_name'
dosubot[bot] commented 2 months ago

Hi @NasonZ! I'm here to help you with any bugs, questions, or contributions you have. Let's work together to resolve any issues you're facing.

To resolve the TypeError: KnowledgeGraphIndex._build_index_from_nodes() got an unexpected keyword argument 'space_name' error, ensure that the space_name argument is not passed to the KnowledgeGraphIndex.from_documents method. Instead, it should be used when instantiating the NebulaGraphStore.

Here is the correct way to instantiate the NebulaGraphStore and create the StorageContext:

from llama_index.legacy.storage.storage_context import StorageContext
from llama_index.legacy.graph_stores.nebula import NebulaGraphStore

space_name = "llamaindex"
edge_types, rel_prop_names = ["relationship"], ["relationship"]
tags = ["entity"]

graph_store = NebulaGraphStore(
    space_name=space_name,
    edge_types=edge_types,
    rel_prop_names=rel_prop_names,
    tags=tags,
)
storage_context = StorageContext.from_defaults(graph_store=graph_store)

Then, use the storage_context when creating the KnowledgeGraphIndex:

from llama_index.core import KnowledgeGraphIndex

kg_index = KnowledgeGraphIndex.from_documents(
    documents,
    storage_context=storage_context,
    max_triplets_per_chunk=10,
    edge_types=edge_types,
    rel_prop_names=rel_prop_names,
    tags=tags,
    include_embeddings=True,
)

This adjustment ensures that the space_name is correctly configured within the NebulaGraphStore and StorageContext, and not passed directly to the KnowledgeGraphIndex.from_documents method [1].

To continue talking to Dosu, mention @dosu.

NasonZ commented 2 months ago

@dosu

I tried:

from llama_index.legacy.storage.storage_context import StorageContext
from llama_index.legacy.graph_stores.nebulagraph import NebulaGraphStore

space_name = "guardians"
edge_types, rel_prop_names = ["relationship"], ["relationship"]
tags = ["entity"]

graph_store = NebulaGraphStore(
    space_name=space_name,
    edge_types=edge_types,
    rel_prop_names=rel_prop_names,
    tags=tags,
)
storage_context = StorageContext.from_defaults(graph_store=graph_store)

from llama_index.core import download_loader
from llama_index.readers.wikipedia import WikipediaReader

loader = WikipediaReader()

documents = loader.load_data(
    pages=["Guardians of the Galaxy Vol. 3"], auto_suggest=False
)

from llama_index.core import KnowledgeGraphIndex

kg_index = KnowledgeGraphIndex.from_documents(
    documents,
    storage_context=storage_context,
    max_triplets_per_chunk=10,
    edge_types=edge_types,
    rel_prop_names=rel_prop_names,
    tags=tags,
    include_embeddings=True,
)

which produced this error:

TypeError Traceback (most recent call last)
Cell In[18], line 28
21 documents = loader.load_data(
22 pages=["Guardians of the Galaxy Vol. 3"], auto_suggest=False
23 )
26 from llama_index.core import KnowledgeGraphIndex
---> 28 kg_index = KnowledgeGraphIndex.from_documents(
29 documents,
30 storage_context=storage_context,
31 max_triplets_per_chunk=10,
32 # edge_types=edge_types,
33 rel_prop_names=rel_prop_names,
34 tags=tags,
35 include_embeddings=True,
36 )

File c:\Users... \lib\site-packages\llama_index\core\indices\base.py:145, in BaseIndex.from_documents(cls, documents, storage_context, show_progress, callback_manager, transformations, service_context, **kwargs)
136 docstore.set_document_hash(doc.get_doc_id(), doc.hash)
138 nodes = run_transformations(
139 documents, # type: ignore
140 transformations,
141 show_progress=show_progress,
142 **kwargs,
143 )
--> 145 return cls(
146 nodes=nodes,
147 storage_context=storage_context,
148 callback_manager=callback_manager,
149 show_progress=show_progress,
150 transformations=transformations,
151 service_context=service_context,
152 **kwargs,
153 )

File c:\Users... \lib\site-packages\llama_index\core\indices\knowledge_graph\base.py:109, in KnowledgeGraphIndex.init(self, nodes, objects, index_struct, llm, embed_model, storage_context, kg_triplet_extract_template, max_triplets_per_chunk, include_embeddings, show_progress, max_object_length, kg_triplet_extract_fn, service_context, **kwargs)
104 self._llm = llm or llm_from_settings_or_context(Settings, service_context)
105 self._embed_model = embed_model or embed_model_from_settings_or_context(
106 Settings, service_context
107 )
--> 109 super().init(
110 nodes=nodes,
111 index_struct=index_struct,
112 service_context=service_context,
113 storage_context=storage_context,
114 show_progress=show_progress,
115 objects=objects,
116 **kwargs,
117 )
119 # TODO: legacy conversion - remove in next release
120 if (
121 len(self.index_struct.table) > 0
122 and isinstance(self.graph_store, SimpleGraphStore)
123 and len(self.graph_store._data.graph_dict) == 0
124 ):

File c:\Users... \lib\site-packages\llama_index\core\indices\base.py:94, in BaseIndex.init(self, nodes, objects, index_struct, storage_context, callback_manager, transformations, show_progress, service_context, **kwargs)
92 if index_struct is None:
93 nodes = nodes or []
---> 94 index_struct = self.build_index_from_nodes(
95 nodes + objects, **kwargs # type: ignore
96 )
97 self._index_struct = index_struct
98 self._storage_context.index_store.add_index_struct(self._index_struct)

File c:\Users... \lib\site-packages\llama_index\core\indices\base.py:216, in BaseIndex.build_index_from_nodes(self, nodes, **build_kwargs)
214 """Build the index from nodes."""
215 self._docstore.add_documents(nodes, allow_update=True)
--> 216 return self._build_index_from_nodes(nodes, **build_kwargs)

TypeError: KnowledgeGraphIndex._build_index_from_nodes() got an unexpected keyword argument 'edge_types'
dosubot[bot] commented 2 months ago

To resolve the error, ensure that the edge_types argument is not passed to the KnowledgeGraphIndex.from_documents method. The edge_types parameter is used in the NebulaGraphStore class, but it is not applicable in the context of KnowledgeGraphIndex.from_documents.

Here is the corrected code:

from llama_index.legacy.storage.storage_context import StorageContext
from llama_index.legacy.graph_stores.nebulagraph import NebulaGraphStore

space_name = "guardians"
edge_types, rel_prop_names = ["relationship"], ["relationship"]
tags = ["entity"]

graph_store = NebulaGraphStore(
    space_name=space_name,
    edge_types=edge_types,
    rel_prop_names=rel_prop_names,
    tags=tags,
)
storage_context = StorageContext.from_defaults(graph_store=graph_store)

from llama_index.core import download_loader
from llama_index.readers.wikipedia import WikipediaReader

loader = WikipediaReader()

documents = loader.load_data(
    pages=["Guardians of the Galaxy Vol. 3"], auto_suggest=False
)

from llama_index.core import KnowledgeGraphIndex

kg_index = KnowledgeGraphIndex.from_documents(
    documents,
    storage_context=storage_context,
    max_triplets_per_chunk=10,
    rel_prop_names=rel_prop_names,
    tags=tags,
    include_embeddings=True,
)

By removing the edge_types argument from the KnowledgeGraphIndex.from_documents method call, the method should work without raising the TypeError [1][2][3].

To continue talking to Dosu, mention @dosu.

logan-markewich commented 1 month ago

This is a bug in the library that probably needs to be fixed

But also, I don't think all those kwargs are needed

kg_index = KnowledgeGraphIndex.from_documents( documents, storage_context=storage_context, max_triplets_per_chunk=10, include_embeddings=True, )