Closed prasantpoudel closed 3 months ago
It should be that something went wrong in your index phase. You can look at the logs in the index phase.
Yes this is due to your locally run embedding model, not returning the weights in a correct format. OpenAI uses internally base64 encoded floats, and most other models will return floats as numbers.
I've hacked the encoding_format into this piece of code to make local search work:
def map_query_to_entities(
query: str,
text_embedding_vectorstore: BaseVectorStore,
text_embedder: BaseTextEmbedding,
all_entities: list[Entity],
embedding_vectorstore_key: str = EntityVectorStoreKey.ID,
include_entity_names: list[str] | None = None,
exclude_entity_names: list[str] | None = None,
k: int = 10,
oversample_scaler: int = 2,
) -> list[Entity]:
"""Extract entities that match a given query using semantic similarity of text embeddings of query and entity descriptions."""
if include_entity_names is None:
include_entity_names = []
if exclude_entity_names is None:
exclude_entity_names = []
matched_entities = []
if query != "":
# get entities with highest semantic similarity to query
# oversample to account for excluded entities
search_results = text_embedding_vectorstore.similarity_search_by_text(
text=query,
text_embedder=lambda t: text_embedder.embed(t, encoding_format="float"), # added to make embedding api work, openai uses base64 by default
k=k * oversample_scaler,
)
for result in search_results:
matched = get_entity_by_key(
entities=all_entities,
key=embedding_vectorstore_key,
value=result.document.id,
)
if matched:
matched_entities.append(matched)
else:
all_entities.sort(key=lambda x: x.rank if x.rank else 0, reverse=True)
matched_entities = all_entities[:k]
# filter out excluded entities
if exclude_entity_names:
matched_entities = [
entity
for entity in matched_entities
if entity.title not in exclude_entity_names
]
# add entities in the include_entity list
included_entities = []
for entity_name in include_entity_names:
included_entities.extend(get_entity_by_name(all_entities, entity_name))
return included_entities + matched_entities
是的,这是由于您本地运行的嵌入模型未以正确的格式返回权重。OpenAI 使用内部 base64 编码的浮点数,而大多数其他模型将以数字形式返回浮点数。
我把 encoding_format 修改成了这段代码,以使本地搜索能够正常工作:
def map_query_to_entities( query: str, text_embedding_vectorstore: BaseVectorStore, text_embedder: BaseTextEmbedding, all_entities: list[Entity], embedding_vectorstore_key: str = EntityVectorStoreKey.ID, include_entity_names: list[str] | None = None, exclude_entity_names: list[str] | None = None, k: int = 10, oversample_scaler: int = 2, ) -> list[Entity]: """Extract entities that match a given query using semantic similarity of text embeddings of query and entity descriptions.""" if include_entity_names is None: include_entity_names = [] if exclude_entity_names is None: exclude_entity_names = [] matched_entities = [] if query != "": # get entities with highest semantic similarity to query # oversample to account for excluded entities search_results = text_embedding_vectorstore.similarity_search_by_text( text=query, text_embedder=lambda t: text_embedder.embed(t, encoding_format="float"), # added to make embedding api work, openai uses base64 by default k=k * oversample_scaler, ) for result in search_results: matched = get_entity_by_key( entities=all_entities, key=embedding_vectorstore_key, value=result.document.id, ) if matched: matched_entities.append(matched) else: all_entities.sort(key=lambda x: x.rank if x.rank else 0, reverse=True) matched_entities = all_entities[:k] # filter out excluded entities if exclude_entity_names: matched_entities = [ entity for entity in matched_entities if entity.title not in exclude_entity_names ] # add entities in the include_entity list included_entities = [] for entity_name in include_entity_names: included_entities.extend(get_entity_by_name(all_entities, entity_name)) return included_entities + matched_entities
这好像改了还是不工作
It's because you're using local model. If you are using Ollama (just like me), you might see this answer: https://github.com/microsoft/graphrag/issues/345#issuecomment-2212471697 Works perfectly for me...
Consolidating alternate model issues here: https://github.com/microsoft/graphrag/issues/657
Hello, currently running through the same problem, I am using an azure openai instance
text_embedder = OpenAIEmbedding(
api_key=api_key,
deployment_name="ada-small-emb-graphrag",
model="text-embedding-ada-002",
api_base="https://xxx-oai.openai.azure.com/",
)
text_embedder.embed("hello world")
This returns the error
ZeroDivisionError: Weights sum to zero, can't be normalized
I have added the float encoding in teh source code but still do not work
text_embedder=lambda t: text_embedder.embed(t, encoding_format="float")
Any ideas why is still not working? Thanks
I also encountered this situation, but because I did not connect to openai, I checked the api_base and api_key, and there was no problem.
Yes this is due to your locally run embedding model, not returning the weights in a correct format. OpenAI uses internally base64 encoded floats, and most other models will return floats as numbers.
I've hacked the encoding_format into this piece of code to make local search work:
def map_query_to_entities( query: str, text_embedding_vectorstore: BaseVectorStore, text_embedder: BaseTextEmbedding, all_entities: list[Entity], embedding_vectorstore_key: str = EntityVectorStoreKey.ID, include_entity_names: list[str] | None = None, exclude_entity_names: list[str] | None = None, k: int = 10, oversample_scaler: int = 2, ) -> list[Entity]: """Extract entities that match a given query using semantic similarity of text embeddings of query and entity descriptions.""" if include_entity_names is None: include_entity_names = [] if exclude_entity_names is None: exclude_entity_names = [] matched_entities = [] if query != "":
get entities with highest semantic similarity to query
# oversample to account for excluded entities search_results = text_embedding_vectorstore.similarity_search_by_text( text=query, text_embedder=lambda t: text_embedder.embed(t, encoding_format="float"), # added to make embedding api work, openai uses base64 by default k=k * oversample_scaler, ) for result in search_results: matched = get_entity_by_key( entities=all_entities, key=embedding_vectorstore_key, value=result.document.id, ) if matched: matched_entities.append(matched) else: all_entities.sort(key=lambda x: x.rank if x.rank else 0, reverse=True) matched_entities = all_entities[:k] # filter out excluded entities if exclude_entity_names: matched_entities = [ entity for entity in matched_entities if entity.title not in exclude_entity_names ] # add entities in the include_entity list included_entities = [] for entity_name in include_entity_names: included_entities.extend(get_entity_by_name(all_entities, entity_name)) return included_entities + matched_entities
where I should place this code?
Describe the issue
When I run the query using local scope I got the error of ZeroDivisionError: Weights sum to zero, can't be normalized. But for the Global scope it worked correctly. If any one have the Idea please give the solution.
python3 -m graphrag.query \ --root ./ragtest \ --method local \ "Who is Scrooge, and what are his main relationships?" INFO: Reading settings from ragtest/settings.yaml creating llm client with {'api_key': 'REDACTED,len=19', 'type': "openai_chat", 'model': 'mistral:7b', 'max_tokens': 4000, 'request_timeout': 180.0, 'api_base': 'http://localhost:11434/v1', 'api_version': None, 'organization': None, 'proxy': None, 'cognitive_services_endpoint': None, 'deployment_name': None, 'model_supports_json': True, 'tokens_per_minute': 0, 'requests_per_minute': 0, 'max_retries': 10, 'max_retry_wait': 10.0, 'sleep_on_rate_limit_recommendation': True, 'concurrent_requests': 10} creating embedding llm client with {'api_key': 'REDACTED,len=19', 'type': "openai_embedding", 'model': 'nomic-embed-text', 'max_tokens': 4000, 'request_timeout': 180.0, 'api_base': 'http://localhost:11434/v1', 'api_version': None, 'organization': None, 'proxy': None, 'cognitive_services_endpoint': None, 'deployment_name': None, 'model_supports_json': None, 'tokens_per_minute': 0, 'requests_per_minute': 0, 'max_retries': 10, 'max_retry_wait': 10.0, 'sleep_on_rate_limit_recommendation': True, 'concurrent_requests': 10} Error embedding chunk {'OpenAIEmbedding': "Error code: 400 - {'error': {'message': 'invalid input type', 'type': 'api_error', 'param': None, 'code': None}}"} Traceback (most recent call last): File "<frozen runpy>", line 198, in _run_module_as_main File "<frozen runpy>", line 88, in _run_code File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/graphrag/query/__main__.py", line 75, in <module> run_local_search( File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/graphrag/query/cli.py", line 154, in run_local_search result = search_engine.search(query=query) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/graphrag/query/structured_search/local_search/search.py", line 118, in search context_text, context_records = self.context_builder.build_context( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/graphrag/query/structured_search/local_search/mixed_context.py", line 139, in build_context selected_entities = map_query_to_entities( ^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/graphrag/query/context_builder/entity_extraction.py", line 55, in map_query_to_entities search_results = text_embedding_vectorstore.similarity_search_by_text( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/graphrag/vector_stores/lancedb.py", line 118, in similarity_search_by_text query_embedding = text_embedder(text) ^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/graphrag/query/context_builder/entity_extraction.py", line 57, in <lambda> text_embedder=lambda t: text_embedder.embed(t), ^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/graphrag/query/llm/oai/embedding.py", line 96, in embed chunk_embeddings = np.average(chunk_embeddings, axis=0, weights=chunk_lens) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/numpy/lib/function_base.py", line 550, in average raise ZeroDivisionError( ZeroDivisionError: Weights sum to zero, can't be normalized
Steps to reproduce
No response
GraphRAG Config Used
No response
Logs and screenshots
No response
Additional Information
- GraphRAG Version:
- Operating System:
- Python Version:
- Related Issues:
This may be caused by an invalid Api_key
Describe the issue
When I run the query using local scope I got the error of ZeroDivisionError: Weights sum to zero, can't be normalized. But for the Global scope it worked correctly. If any one have the Idea please give the solution.
Steps to reproduce
No response
GraphRAG Config Used
No response
Logs and screenshots
No response
Additional Information