Closed zw-change closed 3 months ago
Facing the same error with locally deployed gte-small embeddings.
Consolidating alternate model issues here: https://github.com/microsoft/graphrag/issues/657
same problem and did not get solution from other issues
--method local is not work
The locally running embedding model in OLLAMA returns the weights in an incorrect format. OpenAI internally uses base64 encoded floats, whereas most other models return floats as numbers. However, If you want to use open-source models, I've put together a repository for deploying models from HuggingFace to local endpoints, having similar endpoints with compatible format as OpenAI API. Here’s the link to the repo: https://github.com/rushizirpe/open-llm-server
Also, I have created a Colab notebook (working for global as well as local search) for Graphrag: https://colab.research.google.com/drive/1uhFDnih1WKrSRQHisU-L6xw6coapgR51?usp=sharing
same + 1
我修改了这个地方,可能会对你有帮助:site-packages/graphrag/query/llm/oai/embedding.py
修改文件:site-packages\graphrag\query\llm\text_utils.py里关于chunk_text()函数的定义: ` def chunk_text( text: str, max_tokens: int, token_encoder: tiktoken.Encoding | None = None ): """Chunk text by token length.""" if token_encoder is None: token_encoder = tiktoken.get_encoding("cl100k_base") tokens = token_encoder.encode(text) # type: ignore tokens = token_encoder.decode(tokens) # 将tokens解码成字符串
chunk_iterator = batched(iter(tokens), max_tokens)
yield from chunk_iterator
`
Describe the issue
;when I use the ollama local model and used the local query will make a mistake ,but global query did't have this problem
Steps to reproduce
https://github.com/TheAiSingularity/graphrag-local-ollama
GraphRAG Config Used
encoding_model: cl100k_base skip_workflows: [] llm: api_key: ${GRAPHRAG_API_KEY} type: openai_chat # or azure_openai_chat model: mistral model_supports_json: true # recommended if this is available for your model.
max_tokens: 4000
request_timeout: 180.0
api_base: http://192.168.0.17:11434/v1
api_version: 2024-02-15-preview
organization:
deployment_name:
tokens_per_minute: 150_000 # set a leaky bucket throttle
requests_per_minute: 10_000 # set a leaky bucket throttle
max_retries: 10
max_retry_wait: 10.0
sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
concurrent_requests: 25 # the number of parallel inflight requests that may be made
parallelization: stagger: 0.3
num_threads: 50 # the number of threads to use for parallel processing
async_mode: threaded # or asyncio
embeddings:
parallelization: override the global parallelization settings for embeddings
async_mode: threaded # or asyncio llm: api_key: ${GRAPHRAG_API_KEY} type: openai_embedding # or azure_openai_embedding model: nomic_embed_text api_base: http://192.168.0.17:11434/api
api_version: 2024-02-15-preview
chunks: size: 300 overlap: 100 group_by_columns: [id] # by default, we don't allow chunks to cross documents
input: type: file # or blob file_type: text # or csv base_dir: "input" file_encoding: utf-8 file_pattern: ".*\.txt$"
cache: type: file # or blob base_dir: "cache"
connection_string:
container_name:
storage: type: file # or blob base_dir: "output/${timestamp}/artifacts"
connection_string:
container_name:
reporting: type: file # or console, blob base_dir: "output/${timestamp}/reports"
connection_string:
container_name:
entity_extraction:
llm: override the global llm settings for this task
parallelization: override the global parallelization settings for this task
async_mode: override the global async_mode settings for this task
prompt: "prompts/entity_extraction.txt" entity_types: [organization,person,geo,event] max_gleanings: 0
summarize_descriptions:
llm: override the global llm settings for this task
parallelization: override the global parallelization settings for this task
async_mode: override the global async_mode settings for this task
prompt: "prompts/summarize_descriptions.txt" max_length: 500
claim_extraction:
llm: override the global llm settings for this task
parallelization: override the global parallelization settings for this task
async_mode: override the global async_mode settings for this task
enabled: true
prompt: "prompts/claim_extraction.txt" description: "Any claims or facts that could be relevant to information discovery." max_gleanings: 0
community_report:
llm: override the global llm settings for this task
parallelization: override the global parallelization settings for this task
async_mode: override the global async_mode settings for this task
prompt: "prompts/community_report.txt" max_length: 2000 max_input_length: 8000
cluster_graph: max_cluster_size: 10
embed_graph: enabled: false # if true, will generate node2vec embeddings for nodes
num_walks: 10
walk_length: 40
window_size: 2
iterations: 3
random_seed: 597832
umap: enabled: false # if true, will generate UMAP embeddings for nodes
snapshots: graphml: true raw_entities: yes top_level_nodes: yes
local_search:
text_unit_prop: 0.5
community_prop: 0.1
conversation_history_max_turns: 5
top_k_mapped_entities: 10
top_k_relationships: 10
max_tokens: 12000
global_search:
max_tokens: 12000
data_max_tokens: 12000
map_max_tokens: 1000
reduce_max_tokens: 2000
concurrency: 32
Logs and screenshots
{"type": "error", "data": "Community Report Extraction Error", "stack": "Traceback (most recent call last):\n File \"E:\Anaconda3\envs\graphrag-ollama-local\graphrag-local-ollama\graphrag\index\graph\extractors\community_reports\community_reports_extractor.py\", line 58, in call\n await self._llm(\n File \"E:\Anaconda3\envs\graphrag-ollama-local\graphrag-local-ollama\graphrag\llm\openai\json_parsing_llm.py\", line 34, in call\n result = await self._delegate(input, kwargs)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"E:\Anaconda3\envs\graphrag-ollama-local\graphrag-local-ollama\graphrag\llm\openai\openai_token_replacing_llm.py\", line 37, in call\n return await self._delegate(input, kwargs)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"E:\Anaconda3\envs\graphrag-ollama-local\graphrag-local-ollama\graphrag\llm\openai\openai_history_tracking_llm.py\", line 33, in call\n output = await self._delegate(input, kwargs)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"E:\Anaconda3\envs\graphrag-ollama-local\graphrag-local-ollama\graphrag\llm\base\caching_llm.py\", line 104, in call\n result = await self._delegate(input, kwargs)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"E:\Anaconda3\envs\graphrag-ollama-local\graphrag-local-ollama\graphrag\llm\base\rate_limiting_llm.py\", line 177, in call\n result, start = await execute_with_retry()\n ^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"E:\Anaconda3\envs\graphrag-ollama-local\graphrag-local-ollama\graphrag\llm\base\rate_limiting_llm.py\", line 159, in execute_with_retry\n async for attempt in retryer:\n File \"C:\Python311\Lib\site-packages\tenacity\asyncio\init.py\", line 166, in anext\n do = await self.iter(retry_state=self._retry_state)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"C:\Python311\Lib\site-packages\tenacity\asyncio\init.py\", line 153, in iter\n result = await action(retry_state)\n ^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"C:\Python311\Lib\site-packages\tenacity\_utils.py\", line 99, in inner\n return call(*args, kwargs)\n ^^^^^^^^^^^^^^^^^^^^^\n File \"C:\Python311\Lib\site-packages\tenacity\init.py\", line 398, in\n self._add_action_func(lambda rs: rs.outcome.result())\n ^^^^^^^^^^^^^^^^^^^\n File \"C:\Python311\Lib\concurrent\futures\_base.py\", line 449, in result\n return self.get_result()\n ^^^^^^^^^^^^^^^^^^^\n File \"C:\Python311\Lib\concurrent\futures\_base.py\", line 401, in get_result\n raise self._exception\n File \"E:\Anaconda3\envs\graphrag-ollama-local\graphrag-local-ollama\graphrag\llm\base\rate_limiting_llm.py\", line 165, in execute_with_retry\n return await do_attempt(), start\n ^^^^^^^^^^^^^^^^^^\n File \"E:\Anaconda3\envs\graphrag-ollama-local\graphrag-local-ollama\graphrag\llm\base\rate_limiting_llm.py\", line 147, in do_attempt\n return await self._delegate(input, kwargs)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"E:\Anaconda3\envs\graphrag-ollama-local\graphrag-local-ollama\graphrag\llm\base\base_llm.py\", line 48, in call\n return await self._invoke_json(input, **kwargs)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"E:\Anaconda3\envs\graphrag-ollama-local\graphrag-local-ollama\graphrag\llm\openai\openai_chat_llm.py\", line 90, in _invoke_json\n raise RuntimeError(FAILED_TO_CREATE_JSON_ERROR)\nRuntimeError: Failed to generate valid JSON output\n", "source": "Failed to generate valid JSON output", "details": null}
THE terminal OUT
INFO: Reading settings from settings.yaml creating llm client with {'api_key': 'REDACTED,len=9', 'type': "openai_chat", 'model': 'mistral', 'max_tokens': 4000, 'temperature': 0.0, 'top_p': 1.0, 'request_timeout': 180.0, 'api_base': 'http://192.168.0.17:11434/v1', 'api_version': None, 'organization': None, 'proxy': None, 'cognitive_services_endpoint': None, 'deployment_name': None, 'model_supports_json': True, 'tokens_per_minute': 0, 'requests_per_minute': 0, 'max_retries': 10, 'max_retry_wait': 10.0, 'sleep_on_rate_limit_recommendation': True, 'concurrent_requests': 25} creating embedding llm client with {'api_key': 'REDACTED,len=9', 'type': "openai_embedding", 'model': 'nomic_embed_text', 'max_tokens': 4000, 'temperature': 0, 'top_p': 1, 'request_timeout': 180.0, 'api_base': 'http://192.168.0.17:11434/api', 'api_version': None, 'organization': None, 'proxy': None, 'cognitive_services_endpoint': None, 'deployment_name': None, 'model_supports_json': None, 'tokens_per_minute': 0, 'requests_per_minute': 0, 'max_retries': 10, 'max_retry_wait': 10.0, 'sleep_on_rate_limit_recommendation': True, 'concurrent_requests': 25} Error embedding chunk {'OpenAIEmbedding': 'Error raised by inference API HTTP code: 404, {"error":"model \"nomic_embed_text\" not found, try pulling it first"}'} Traceback (most recent call last): File "", line 198, in _run_module_as_main
File "", line 88, in _run_code
File "E:\Anaconda3\envs\graphrag-ollama-local\graphrag-local-ollama\graphrag\query__main__.py", line 76, in
run_local_search(
File "E:\Anaconda3\envs\graphrag-ollama-local\graphrag-local-ollama\graphrag\query\cli.py", line 154, in run_local_search
result = search_engine.search(query=query)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\Anaconda3\envs\graphrag-ollama-local\graphrag-local-ollama\graphrag\query\structured_search\local_search\search.py", line 118, in search
context_text, context_records = self.context_builder.build_context(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
selected_entities = map_query_to_entities(
^^^^^^^^^^^^^^^^^^^^^^
File "E:\Anaconda3\envs\graphrag-ollama-local\graphrag-local-ollama\graphrag\query\context_builder\entity_extraction.py", line 54, in map_query_to_entities
text_embedder=lambda t: text_embedder.embed(t, encoding_format="float"), # added to make embedding api work, openai uses base64 by default
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\Anaconda3\envs\graphrag-ollama-local\graphrag-local-ollama\graphrag\query\llm\oai\embedding.py", line 99, in embed
chunk_embeddings = np.average(chunk_embeddings, axis=0, weights=chunk_lens)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python311\Lib\site-packages\numpy\lib\function_base.py", line 550, in average
raise ZeroDivisionError(
ZeroDivisionError: Weights sum to zero, can't be normalized
search_results = text_embedding_vectorstore.similarity_search_by_text( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\Anaconda3\envs\graphrag-ollama-local\graphrag-local-ollama\graphrag\vector_stores\lancedb.py", line 118, in similarity_search_by_text query_embedding = text_embedder(text) ^^^^^^^^^^^^^^^^^^^ File "E:\Anaconda3\envs\graphrag-ollama-local\graphrag-local-ollama\graphrag\query\context_builder\entity_extraction.py", line 56, in
Additional Information