run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
35.36k stars 4.98k forks source link

[Question]: Why is PropertyGraphIndex with and use_async=True with OllamaEmbedding errorring out? #15292

Open theconnectionist opened 1 month ago

theconnectionist commented 1 month ago

Question Validation

Question

I'm running GraphRAG_v1.ipynb. I ran it successfully with the OpenAI llm and embedding models (i.e all I did was provide my openai key). Next I tried the same notebook with Ollama based llm and embedding models with a locally running server.

The Ollama llm had no problem running as part of the notebook. Next I tried using OllamaEmbedding, initialized as follows:

from llama_index.embeddings.ollama import OllamaEmbedding
ollama_embedding = OllamaEmbedding(
    model_name="gemma2:2b-text-q2_K",
    base_url="http://localhost:11434",
    ollama_additional_kwargs={"mirostat": 0}
)

and tested/verified as follows:

# Test local embedding
pass_embedding = ollama_embedding.get_text_embedding_batch(["This is a passage!", "This is another passage"], show_progress=True)
print(pass_embedding)

query_embedding = ollama_embedding.get_query_embedding("Where is blue?")
print(query_embedding)

Next I used ollama_embedding in PropertyGraphIndex:

from llama_index.core import PropertyGraphIndex

index = PropertyGraphIndex(
    nodes=nodes,
    property_graph_store=GraphRAGStore(),
    kg_extractors=[kg_extractor],
    show_progress=True,
    embed_model=ollama_embedding,
    embed_kg_nodes=True
)

That is consistently erroring out as shown below. The only way to work around this is to set use_async=False. However, I imagine that will slow down my indexing operations. I couldn't find anything relevant in previous issues or at discord.

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
[f:\graphrag\llamaindex\GraphRAG_v1_ollama.ipynb](file:///F:/graphrag/llamaindex/GraphRAG_v1_ollama.ipynb) Cell 27 line 3
      [1](vscode-notebook-cell:/f%3A/graphrag/llamaindex/GraphRAG_v1_ollama.ipynb#X34sZmlsZQ%3D%3D?line=0) from llama_index.core import PropertyGraphIndex
----> [3](vscode-notebook-cell:/f%3A/graphrag/llamaindex/GraphRAG_v1_ollama.ipynb#X34sZmlsZQ%3D%3D?line=2) index = PropertyGraphIndex(
      [4](vscode-notebook-cell:/f%3A/graphrag/llamaindex/GraphRAG_v1_ollama.ipynb#X34sZmlsZQ%3D%3D?line=3)     nodes=nodes,
      [5](vscode-notebook-cell:/f%3A/graphrag/llamaindex/GraphRAG_v1_ollama.ipynb#X34sZmlsZQ%3D%3D?line=4)     property_graph_store=GraphRAGStore(),
      [6](vscode-notebook-cell:/f%3A/graphrag/llamaindex/GraphRAG_v1_ollama.ipynb#X34sZmlsZQ%3D%3D?line=5)     kg_extractors=[kg_extractor],
      [7](vscode-notebook-cell:/f%3A/graphrag/llamaindex/GraphRAG_v1_ollama.ipynb#X34sZmlsZQ%3D%3D?line=6)     show_progress=True,
      [8](vscode-notebook-cell:/f%3A/graphrag/llamaindex/GraphRAG_v1_ollama.ipynb#X34sZmlsZQ%3D%3D?line=7)     embed_model=ollama_embedding,
      [9](vscode-notebook-cell:/f%3A/graphrag/llamaindex/GraphRAG_v1_ollama.ipynb#X34sZmlsZQ%3D%3D?line=8)  embed_kg_nodes=True
     [10](vscode-notebook-cell:/f%3A/graphrag/llamaindex/GraphRAG_v1_ollama.ipynb#X34sZmlsZQ%3D%3D?line=9) )

File [e:\tools\miniconda3\envs\llamaindex-graphrag\Lib\site-packages\llama_index\core\indices\property_graph\base.py:134](file:///E:/tools/miniconda3/envs/llamaindex-graphrag/Lib/site-packages/llama_index/core/indices/property_graph/base.py:134), in PropertyGraphIndex.__init__(self, nodes, llm, kg_extractors, property_graph_store, vector_store, use_async, embed_model, embed_kg_nodes, callback_manager, transformations, storage_context, show_progress, **kwargs)
    128 self._embed_kg_nodes = embed_kg_nodes
    129 self._override_vector_store = (
    130     vector_store is not None
    131     or not storage_context.property_graph_store.supports_vector_queries
    132 )
--> 134 super().__init__(
    135     nodes=nodes,
    136     callback_manager=callback_manager,
    137     storage_context=storage_context,
    138     transformations=transformations,
    139     show_progress=show_progress,
...
    416             embeddings=embeddings,
    417         )
    418     )

TypeError: 'coroutine' object is not iterable

My Question: Why is this error occurring when use_async is True. Did I do something wrong? Or, is it is a bug in the underlying implementation?

dosubot[bot] commented 1 month ago

Hey @theconnectionist, I'm here to help you with any bugs, questions, or contributions you have. Let's tackle this together!

The error "TypeError: 'coroutine' object is not iterable" occurs because the code is trying to iterate over a coroutine object, which is not directly iterable. This typically happens when an asynchronous function is called without awaiting its result.

In your case, the issue likely arises because the OllamaEmbedding class's asynchronous methods are not being awaited properly when use_async is set to True.

To resolve this issue, ensure that the asynchronous methods are awaited correctly. Here is a general approach to fix this:

  1. Ensure that the embedding model's asynchronous methods are awaited:

    • If OllamaEmbedding has asynchronous methods like aget_text_embedding_batch, make sure they are awaited properly.
  2. Modify the PropertyGraphIndex initialization to handle asynchronous embedding:

    • Ensure that the embedding process within PropertyGraphIndex correctly handles asynchronous calls.

Here is an example of how you might modify the embedding process to handle asynchronous calls:

from llama_index.embeddings.ollama import OllamaEmbedding
from llama_index.core import PropertyGraphIndex
import asyncio

# Initialize OllamaEmbedding
ollama_embedding = OllamaEmbedding(
    model_name="gemma2:2b-text-q2_K",
    base_url="http://localhost:11434",
    ollama_additional_kwargs={"mirostat": 0}
)

# Ensure async embedding is handled properly
async def create_index():
    index = PropertyGraphIndex(
        nodes=nodes,
        property_graph_store=GraphRAGStore(),
        kg_extractors=[kg_extractor],
        show_progress=True,
        embed_model=ollama_embedding,
        embed_kg_nodes=True,
        use_async=True
    )
    return index

# Run the async function
index = asyncio.run(create_index())

By properly awaiting asynchronous methods, you can avoid the "TypeError: 'coroutine' object is not iterable" error. This approach ensures that the embedding process is handled asynchronously and correctly integrated into the PropertyGraphIndex initialization.

To continue talking to Dosu, mention @dosu.

logan-markewich commented 1 month ago

The traceback appears to be cut off, a little difficult to tell what/where the error is

theconnectionist commented 1 month ago

Thanks logan-markewich. I tried with Dosu's suggestion, but that didn't work either. Here is the full output:

{
    "name": "TypeError",
    "message": "'coroutine' object is not iterable",
    "stack": "---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
f:\\graphrag\\llamaindex\\GraphRAG_v1_ollama.ipynb Cell 27 line 2
     <a href='vscode-notebook-cell:/f%3A/graphrag/llamaindex/GraphRAG_v1_ollama.ipynb#X34sZmlsZQ%3D%3D?line=25'>26</a>     return index
     <a href='vscode-notebook-cell:/f%3A/graphrag/llamaindex/GraphRAG_v1_ollama.ipynb#X34sZmlsZQ%3D%3D?line=27'>28</a> # Run the async function
---> <a href='vscode-notebook-cell:/f%3A/graphrag/llamaindex/GraphRAG_v1_ollama.ipynb#X34sZmlsZQ%3D%3D?line=28'>29</a> index = asyncio.run(create_index())

File e:\\tools\\miniconda3\\envs\\llamaindex-graphrag\\Lib\\site-packages\
est_asyncio.py:30, in _patch_asyncio.<locals>.run(main, debug)
     28 task = asyncio.ensure_future(main)
     29 try:
---> 30     return loop.run_until_complete(task)
     31 finally:
     32     if not task.done():

File e:\\tools\\miniconda3\\envs\\llamaindex-graphrag\\Lib\\site-packages\
est_asyncio.py:98, in _patch_loop.<locals>.run_until_complete(self, future)
     95 if not f.done():
     96     raise RuntimeError(
     97         'Event loop stopped before Future completed.')
---> 98 return f.result()

File e:\\tools\\miniconda3\\envs\\llamaindex-graphrag\\Lib\\asyncio\\futures.py:203, in Future.result(self)
    201 self.__log_traceback = False
    202 if self._exception is not None:
--> 203     raise self._exception.with_traceback(self._exception_tb)
    204 return self._result

File e:\\tools\\miniconda3\\envs\\llamaindex-graphrag\\Lib\\asyncio\\tasks.py:314, in Task.__step_run_and_handle_result(***failed resolving arguments***)
    310 try:
    311     if exc is None:
    312         # We use the `send` method directly, because coroutines
    313         # don't have `__iter__` and `__next__` methods.
--> 314         result = coro.send(None)
    315     else:
    316         result = coro.throw(exc)

f:\\graphrag\\llamaindex\\GraphRAG_v1_ollama.ipynb Cell 27 line 1
     <a href='vscode-notebook-cell:/f%3A/graphrag/llamaindex/GraphRAG_v1_ollama.ipynb#X34sZmlsZQ%3D%3D?line=15'>16</a> async def create_index():
---> <a href='vscode-notebook-cell:/f%3A/graphrag/llamaindex/GraphRAG_v1_ollama.ipynb#X34sZmlsZQ%3D%3D?line=16'>17</a>     index = PropertyGraphIndex(
     <a href='vscode-notebook-cell:/f%3A/graphrag/llamaindex/GraphRAG_v1_ollama.ipynb#X34sZmlsZQ%3D%3D?line=17'>18</a>         nodes=nodes,
     <a href='vscode-notebook-cell:/f%3A/graphrag/llamaindex/GraphRAG_v1_ollama.ipynb#X34sZmlsZQ%3D%3D?line=18'>19</a>         property_graph_store=GraphRAGStore(),
     <a href='vscode-notebook-cell:/f%3A/graphrag/llamaindex/GraphRAG_v1_ollama.ipynb#X34sZmlsZQ%3D%3D?line=19'>20</a>         kg_extractors=[kg_extractor],
     <a href='vscode-notebook-cell:/f%3A/graphrag/llamaindex/GraphRAG_v1_ollama.ipynb#X34sZmlsZQ%3D%3D?line=20'>21</a>         show_progress=True,
     <a href='vscode-notebook-cell:/f%3A/graphrag/llamaindex/GraphRAG_v1_ollama.ipynb#X34sZmlsZQ%3D%3D?line=21'>22</a>         embed_model=ollama_embedding,
     <a href='vscode-notebook-cell:/f%3A/graphrag/llamaindex/GraphRAG_v1_ollama.ipynb#X34sZmlsZQ%3D%3D?line=22'>23</a>         embed_kg_nodes=True,
     <a href='vscode-notebook-cell:/f%3A/graphrag/llamaindex/GraphRAG_v1_ollama.ipynb#X34sZmlsZQ%3D%3D?line=23'>24</a>         use_async=True
     <a href='vscode-notebook-cell:/f%3A/graphrag/llamaindex/GraphRAG_v1_ollama.ipynb#X34sZmlsZQ%3D%3D?line=24'>25</a>     )
     <a href='vscode-notebook-cell:/f%3A/graphrag/llamaindex/GraphRAG_v1_ollama.ipynb#X34sZmlsZQ%3D%3D?line=25'>26</a>     return index

File e:\\tools\\miniconda3\\envs\\llamaindex-graphrag\\Lib\\site-packages\\llama_index\\core\\indices\\property_graph\\base.py:134, in PropertyGraphIndex.__init__(self, nodes, llm, kg_extractors, property_graph_store, vector_store, use_async, embed_model, embed_kg_nodes, callback_manager, transformations, storage_context, show_progress, **kwargs)
    128 self._embed_kg_nodes = embed_kg_nodes
    129 self._override_vector_store = (
    130     vector_store is not None
    131     or not storage_context.property_graph_store.supports_vector_queries
    132 )
--> 134 super().__init__(
    135     nodes=nodes,
    136     callback_manager=callback_manager,
    137     storage_context=storage_context,
    138     transformations=transformations,
    139     show_progress=show_progress,
    140     **kwargs,
    141 )

File e:\\tools\\miniconda3\\envs\\llamaindex-graphrag\\Lib\\site-packages\\llama_index\\core\\indices\\base.py:94, in BaseIndex.__init__(self, nodes, objects, index_struct, storage_context, callback_manager, transformations, show_progress, service_context, **kwargs)
     92 if index_struct is None:
     93     nodes = nodes or []
---> 94     index_struct = self.build_index_from_nodes(
     95         nodes + objects, **kwargs  # type: ignore
     96     )
     97 self._index_struct = index_struct
     98 self._storage_context.index_store.add_index_struct(self._index_struct)

File e:\\tools\\miniconda3\\envs\\llamaindex-graphrag\\Lib\\site-packages\\llama_index\\core\\indices\\base.py:216, in BaseIndex.build_index_from_nodes(self, nodes, **build_kwargs)
    214 \"\"\"Build the index from nodes.\"\"\"
    215 self._docstore.add_documents(nodes, allow_update=True)
--> 216 return self._build_index_from_nodes(nodes, **build_kwargs)

File e:\\tools\\miniconda3\\envs\\llamaindex-graphrag\\Lib\\site-packages\\llama_index\\core\\indices\\property_graph\\base.py:328, in PropertyGraphIndex._build_index_from_nodes(self, nodes)
    326 def _build_index_from_nodes(self, nodes: Optional[Sequence[BaseNode]]) -> IndexLPG:
    327     \"\"\"Build index from nodes.\"\"\"
--> 328     nodes = self._insert_nodes(nodes or [])
    330     # this isn't really used or needed
    331     return IndexLPG()

File e:\\tools\\miniconda3\\envs\\llamaindex-graphrag\\Lib\\site-packages\\llama_index\\core\\indices\\property_graph\\base.py:255, in PropertyGraphIndex._insert_nodes(self, nodes)
    250 node_texts = [
    251     node.get_content(metadata_mode=MetadataMode.EMBED) for node in nodes
    252 ]
    254 if self._use_async:
--> 255     embeddings = asyncio.run(
    256         self._embed_model.aget_text_embedding_batch(
    257             node_texts, show_progress=self._show_progress
    258         )
    259     )
    260 else:
    261     embeddings = self._embed_model.get_text_embedding_batch(
    262         node_texts, show_progress=self._show_progress
    263     )

File e:\\tools\\miniconda3\\envs\\llamaindex-graphrag\\Lib\\site-packages\
est_asyncio.py:30, in _patch_asyncio.<locals>.run(main, debug)
     28 task = asyncio.ensure_future(main)
     29 try:
---> 30     return loop.run_until_complete(task)
     31 finally:
     32     if not task.done():

File e:\\tools\\miniconda3\\envs\\llamaindex-graphrag\\Lib\\site-packages\
est_asyncio.py:98, in _patch_loop.<locals>.run_until_complete(self, future)
     95 if not f.done():
     96     raise RuntimeError(
     97         'Event loop stopped before Future completed.')
---> 98 return f.result()

File e:\\tools\\miniconda3\\envs\\llamaindex-graphrag\\Lib\\asyncio\\futures.py:203, in Future.result(self)
    201 self.__log_traceback = False
    202 if self._exception is not None:
--> 203     raise self._exception.with_traceback(self._exception_tb)
    204 return self._result

File e:\\tools\\miniconda3\\envs\\llamaindex-graphrag\\Lib\\asyncio\\tasks.py:314, in Task.__step_run_and_handle_result(***failed resolving arguments***)
    310 try:
    311     if exc is None:
    312         # We use the `send` method directly, because coroutines
    313         # don't have `__iter__` and `__next__` methods.
--> 314         result = coro.send(None)
    315     else:
    316         result = coro.throw(exc)

File e:\\tools\\miniconda3\\envs\\llamaindex-graphrag\\Lib\\site-packages\\llama_index\\core\\instrumentation\\dispatcher.py:290, in Dispatcher.span.<locals>.async_wrapper(func, instance, args, kwargs)
    282 self.span_enter(
    283     id_=id_,
    284     bound_args=bound_args,
   (...)
    287     tags=tags,
    288 )
    289 try:
--> 290     result = await func(*args, **kwargs)
    291 except BaseException as e:
    292     self.event(SpanDropEvent(span_id=id_, err_str=str(e)))

File e:\\tools\\miniconda3\\envs\\llamaindex-graphrag\\Lib\\site-packages\\llama_index\\core\\base\\embeddings\\base.py:406, in BaseEmbedding.aget_text_embedding_batch(self, texts, show_progress)
    403     else:
    404         nested_embeddings = await asyncio.gather(*embeddings_coroutines)
--> 406 result_embeddings = [
    407     embedding for embeddings in nested_embeddings for embedding in embeddings
    408 ]
    410 for (event_id, text_batch), embeddings in zip(
    411     callback_payloads, nested_embeddings
    412 ):
    413     dispatcher.event(
    414         EmbeddingEndEvent(
    415             chunks=text_batch,
    416             embeddings=embeddings,
    417         )
    418     )

TypeError: 'coroutine' object is not iterable"
}
logan-markewich commented 1 month ago

Hmm seems like something in that list comprehension wasn't awaited properly in the source code

theconnectionist commented 1 month ago

@logan-markewich did you mean my code (which is practically the same as GraphRAG_v1.ipynb sample cookbook with the exception of use of Ollama for llm and embed_model) or did you mean llama_index source code?

logan-markewich commented 1 month ago

Llama index source code. Probably in the ollama embedding class

mribbons commented 4 weeks ago

This worked for me:

from llama_index.core import PropertyGraphIndex
from llama_index.embeddings.ollama import OllamaEmbedding
from llama_index.core import Settings

ollama_embedding = OllamaEmbedding(
    model_name="llama3:latest",
)

Settings.embed_model = ollama_embedding

index = PropertyGraphIndex(
    nodes=nodes,
    property_graph_store=GraphRAGStore(),
    kg_extractors=[kg_extractor],
    show_progress=True,
    use_async=False # ollama doesn't like async?
)

Was following this: https://docs.llamaindex.ai/en/latest/examples/cookbooks/GraphRAG_v1/