Open Benny1991-lucky opened 3 months ago
Hey @Benny1991-lucky! I'm here to help you with any bugs, questions, or contributions you have regarding the repository. Let's tackle this issue together!
The issue where the embedding process stops at "Generating embeddings: 0it [00:00, ?it/s]" when using SchemaLLMPathExtractor
is likely due to the embedding model not being correctly set or used during the index construction. This can happen if the embedding model is not properly configured or passed to the PropertyGraphIndex
.
Here are a few steps to ensure the embedding model is correctly set:
Ensure the Embedding Model is Correctly Configured:
Make sure you have correctly initialized and passed the embedding model to the PropertyGraphIndex
. Here is an example using a Hugging Face embedding model:
from llama_index.embeddings.huggingface import HuggingFaceEmbeddings
from llama_index import LangchainEmbedding, PropertyGraphIndex
def get_embed_model():
hf = HuggingFaceEmbeddings(model_name="all-mpnet-base-v2")
return LangchainEmbedding(hf)
embed_model = get_embed_model()
kg_extractor = SchemaLLMPathExtractor(
llm=Ollama(model="llama3", json_mode=True, request_timeout=3600),
possible_entities=entities,
possible_relations=relations,
kg_validation_schema=validation_schema,
strict=False,
max_triplets_per_chunk=10,
num_workers=4,
)
index = PropertyGraphIndex.from_documents(
docs,
kg_extractors=[kg_extractor],
embed_model=embed_model,
show_progress=True,
)
Enable Debug Logging: Enable logging to get more insights into what might be going wrong. This can help you diagnose issues with the embedding process:
import logging
import sys
logging.basicConfig(stream=sys.stdout, level=logging.DEBUG, force=True)
from llama_index.embeddings.huggingface import HuggingFaceEmbeddings
from llama_index import LangchainEmbedding, PropertyGraphIndex
def get_embed_model():
hf = HuggingFaceEmbeddings(model_name="all-mpnet-base-v2")
return LangchainEmbedding(hf)
embed_model = get_embed_model()
kg_extractor = SchemaLLMPathExtractor(
llm=Ollama(model="llama3", json_mode=True, request_timeout=3600),
possible_entities=entities,
possible_relations=relations,
kg_validation_schema=validation_schema,
strict=False,
max_triplets_per_chunk=10,
num_workers=4,
)
index = PropertyGraphIndex.from_documents(
docs,
kg_extractors=[kg_extractor],
embed_model=embed_model,
show_progress=True,
)
Check for Errors or Exceptions:
Ensure that there are no errors or exceptions being raised during the embedding process. The SchemaLLMPathExtractor
handles errors using try-except blocks, but enabling logging will help you see if any exceptions are being caught and handled silently.
By following these steps, you should be able to diagnose and resolve the issue with the embedding process stopping at "Generating embeddings: 0it [00:00, ?it/s]" [1][2].
To continue talking to Dosu, mention @dosu.
Hi,
After I use the LangchainEmbedding
to return the embedding model, I got this error:
AttributeError: 'HuggingFaceEmbedding' object has no attribute 'aembed_documents'
And my HuggingFaceEmbedding class is from llama_index.embeddings.huggingface import HuggingFaceEmbedding
, not HuggingFaceEmbeddings
.
from llama_index import LangchainEmbedding, PropertyGraphIndex
def get_embed_model():
hf = HuggingFaceEmbedding(model_name="all-mpnet-base-v2")
return LangchainEmbedding(hf)
To resolve the AttributeError: 'HuggingFaceEmbedding' object has no attribute 'aembed_documents'
error, you need to ensure that the HuggingFaceEmbedding
class you are using has the aembed_documents
method implemented. Here are the steps to address this:
Check if aembed_documents
is implemented: Ensure that the HuggingFaceEmbedding
class has the aembed_documents
method. If it does not, you need to implement it.
Implement aembed_documents
: If the method is not implemented, you can add it to the HuggingFaceEmbedding
class. Here is an example implementation:
class HuggingFaceEmbedding:
# Existing methods...
async def aembed_documents(self, texts: List[str]) -> List[List[float]]:
# Implement the asynchronous embedding logic here
# For example, you can use an async version of your embedding model
return [await self.embed_document_async(text) for text in texts]
async def embed_document_async(self, text: str) -> List[float]:
# Example async embedding logic
# Replace this with the actual async call to your embedding model
return self.embed_document(text)
LangchainEmbedding
class to fallback to the synchronous method without raising an error. This is already partially handled in the provided code with the _async_not_implemented_warn_once
method.By ensuring that the aembed_documents
method is implemented or properly handled, you can resolve the AttributeError
[1][2].
To continue talking to Dosu, mention @dosu.
Don't use langchain embeddings, you can use our own huggingface class if you need to
But also, it's not stopping, its finishing fine. But it's indicating that zero kg nodes were extracted. If you are using an open-source llm, this is usually expected as they kind of stink with outputting reliable JSON
My advice is try the DynamicLLMPathExtractor instead if you are using open source https://docs.llamaindex.ai/en/stable/module_guides/indexing/lpg_index_guide/?h=dynamic#dynamicllmpathextractor
@logan-markewich , thanks for support.
It works for me to use DynamicLLMPathExtractor
.
BTW, what's the different between DynamicLLMPathExtractor
and SchemaLLMPathExtractor
when using the open source LLM?
So I can not use the SchemaLLMPathExtractor
with open-source LLM, like llam3:8B, right?
Because I already tried the llama3, gemma2 and llama3.1 loaded by ollama. The output always was"Generating embeddings: 0it [00:00, ?it/s]"
.
Don't know if you managed to ever resolve this issue, but I ran into the same! It appears that setting json_mode=True
in the Ollama LLM results in json-formatted triplets being produced by almost every recent Ollama model I tested (example: {"subject":"Sam Altman","predicate":"decided not to","object":"be president"}
) but the parse_fn
in SchemaLLMPathExtractor
object expects it to be in raw format (example: ("Sam Altman","decided not to","be president")
). When I set json_mode=False
, everything started to work as expected!
kg_extractor = SchemaLLMPathExtractor(
llm=Ollama(model="llama3", json_mode=False, request_timeout=3600),
possible_entities=entities,
possible_relations=relations,
kg_validation_schema=validation_schema,
strict=False,
max_triplets_per_chunk=10,
num_workers=4,
)
Bug Description
When I run llamaindex Graph, embedding always stop at "Generating embeddings: 0it [00:00, ?it/s]" when use SchemaLLMPathExtractor. No problem with ImplicitPathExtractor and SimpleLLMPathExtractor.
Some codes: kg_extractor = SchemaLLMPathExtractor( llm=Ollama(model="llama3", json_mode=True, request_timeout=3600), possible_entities=entities, possible_relations=relations, kg_validation_schema=validation_schema,
if false, allows for values outside of the schema
) index = PropertyGraphIndex.from_documents( docs, kg_extractors=[kg_extractor], show_progress=True, ) Output: Extracting paths from text with schema: 100%|██████████| 22/22 [00:56<00:00, 2.55s/it] Generating embeddings: 100%|██████████| 3/3 [00:00<00:00, 3.04it/s] Generating embeddings: 0it [00:00, ?it/s]
Version
llama-index 0.10.58
Steps to Reproduce
Follow this guidance: https://docs.llamaindex.ai/en/stable/examples/property_graph/property_graph_advanced/
Relevant Logs/Tracbacks
No response