microsoft / graphrag

A modular graph-based Retrieval-Augmented Generation (RAG) system
https://microsoft.github.io/graphrag/
MIT License
16.82k stars 1.58k forks source link

[Ollama][Other] GraphRAG OSS LLM community support #339

Closed samalanubhab closed 1 month ago

samalanubhab commented 2 months ago

What I tried: I ran this on my local GPU and and tried replacing the api_base to a model served on ollama in settings.yaml file. model: llama3:latest api_base: http://localhost:11434/v1 #https://.openai.azure.com

Error: graphrag.index.reporting.file_workflow_callbacks INFO Error Invoking LLM details={'input': '\n-Goal-\nGiven a text document that is pot....}

Commands:

initialize

python -m graphrag.index --init --root .

index

python -m graphrag.index --root .

query

python -m graphrag.query --root . --method global "query"

query

python -m graphrag.query --root . --method local "query"

Does graphrag support other llm hosted server frameworks?

natoverse commented 1 month ago

Consolidating Ollama-related issues: https://github.com/microsoft/graphrag/issues/657

yunchonggeng commented 1 month ago

这是我的最终配置。VSCode 崩溃后,当我再次启动它时,摘要报告不知何故开始工作。 这是我迄今为止工作的最终完整配置:


encoding_model: cl100k_base
skip_workflows: []
llm:
  api_key: ${GRAPHRAG_API_KEY}
  type: openai_chat # or azure_openai_chat
  model: llama3
  model_supports_json: true # recommended if this is available for your model.
  max_tokens: 2000
  # request_timeout: 180.0
  api_base: http://localhost:11434/v1
  # api_version: 2024-02-15-preview
  # organization: <organization_id>
  # deployment_name: <azure_model_deployment_name>
  # tokens_per_minute: 150_000 # set a leaky bucket throttle
  # requests_per_minute: 10_000 # set a leaky bucket throttle
  max_retries: 1
  # max_retry_wait: 10.0
  # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
  concurrent_requests: 1 # the number of parallel inflight requests that may be made

parallelization:
  stagger: 0.3
  # num_threads: 50 # the number of threads to use for parallel processing

async_mode: threaded # or asyncio

embeddings:
  ## parallelization: override the global parallelization settings for embeddings
  async_mode: threaded # or asyncio
  llm:
    api_key: ${GRAPHRAG_API_KEY}
    type: openai_embedding # or azure_openai_embedding
    model: text-embedding-3-small
    api_base: https://api.openai.com/v1
    # api_version: 2024-02-15-preview
    # organization: <organization_id>
    # deployment_name: <azure_model_deployment_name>
    # tokens_per_minute: 150_000 # set a leaky bucket throttle
    # requests_per_minute: 10_000 # set a leaky bucket throttle
    max_retries: 1
    # max_retry_wait: 10.0
    # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
    concurrent_requests: 1 # the number of parallel inflight requests that may be made
    batch_size: 1 # the number of documents to send in a single request
    batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
    # target: required # or optional

chunks:
  size: 300
  overlap: 100
  group_by_columns: [id] # by default, we don't allow chunks to cross documents

input:
  type: file # or blob
  file_type: text # or csv
  base_dir: "input"
  file_encoding: utf-8
  file_pattern: ".*\\.txt$"

cache:
  type: file # or blob
  base_dir: "cache"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

storage:
  type: file # or blob
  base_dir: "output/${timestamp}/artifacts"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

reporting:
  type: file # or console, blob
  base_dir: "output/${timestamp}/reports"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

entity_extraction:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/entity_extraction.txt"
  entity_types: [organization,person,geo,event]
  max_gleanings: 0

summarize_descriptions:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/summarize_descriptions.txt"
  max_length: 500

claim_extraction:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  # enabled: true
  prompt: "prompts/claim_extraction.txt"
  description: "Any claims or facts that could be relevant to information discovery."
  max_gleanings: 0

community_report:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/community_report.txt"
  max_length: 2000
  max_input_length: 7000

cluster_graph:
  max_cluster_size: 10

embed_graph:
  enabled: false # if true, will generate node2vec embeddings for nodes
  # num_walks: 10
  # walk_length: 40
  # window_size: 2
  # iterations: 3
  # random_seed: 597832

umap:
  enabled: false # if true, will generate UMAP embeddings for nodes

snapshots:
  graphml: false
  raw_entities: false
  top_level_nodes: false

local_search:
  # text_unit_prop: 0.5
  # community_prop: 0.1
  # conversation_history_max_turns: 5
  # top_k_mapped_entities: 10
  # top_k_relationships: 10
  # max_tokens: 12000

global_search:
  # max_tokens: 12000
  # data_max_tokens: 12000
  # map_max_tokens: 1000
  # reduce_max_tokens: 2000
  # concurrency: 32

本质上我通过 ollama 在本地使用 llama3 作为实体,并使用 openai 嵌入(便宜得多),直到我们找到使用 ollama 的解决方案。

我使用你的设置和默认文本,并且不更改任何其他内容,但仍然

❌ create_final_entities
None
⠹ GraphRAG Indexer 
├── Loading Input (InputFileType.text) - 1 files loaded (0 filtered) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00
├── create_base_text_units
├── create_base_extracted_entities
├── create_summarized_entities
├── create_base_entity_graph
└── create_final_entities
❌ Errors occurred during the pipeline run, see logs for more details.

您需要先启动 ollama curl -fsSL https://ollama.com/install.sh | sh ollama pull llama3 ollama serve

我按照你的步骤操作,仍然得到同样的错误,显示调用 llm 的错误

remove cache

minxiansheng commented 1 month ago

我将其配置为:

embeddings:
  ## parallelization: override the global parallelization settings for embeddings
  async_mode: threaded # or asyncio
  llm:
    api_key: ${GRAPHRAG_API_KEY}
    type: openai_embedding # or azure_openai_embedding
    model: nomic-embed-text
    api_base: http://localhost:11434/v1
    # api_version: 2024-02-15-preview
    # organization: <organization_id>
    # deployment_name: <azure_model_deployment_name>
    # tokens_per_minute: 150_000 # set a leaky bucket throttle
    # requests_per_minute: 10_000 # set a leaky bucket throttle
    # max_retries: 10
    # max_retry_wait: 10.0
    # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
    concurrent_requests: 1 # the number of parallel inflight requests that may be made
    # batch_size: 16 # the number of documents to send in a single request
    # batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
    # target: required # or optional

改用 api v1 👍 14:55:29,949 graphrag.index.verbs.text.embed.strategies.openai INFO 使用 1 个批次通过 9 个片段嵌入 9 个输入。 max_batch_size=16,max_tokens=8191 14:55:31,373 httpx INFO HTTP 请求:POST http://127.0.0.1:11434/api/embeddings “HTTP/1.1 200 OK” 14:55:31,375 graphrag.index.reporting.file_workflow_callbacks INFO 调用 LLM 时出错,详细信息 = {'input':['“团队”:“团队被描绘成一群从被动观察者转变为任务中积极参与者的个人,展示了他们角色的动态变化。”','“华盛顿”:','“行动:杜尔塞”:','“亚历克斯”:“亚历克斯是试图与未知情报进行首次接触的团队的领导者,承认他们任务的重要性。”','“控制”:“控制是指管理或治理的能力,这是受到编写自己规则的情报的挑战。”', '“INTELLIGENCE”:“这里的情报是指能够编写自己的规则并学习交流的未知实体。”', '“FIRST CONTACT”:“第一次接触”:“第一次接触是人类与未知情报之间潜在的初次交流。”', '“SAM RIVERA”:', '“人类”的回应:']} 14:55:31,375 datashaper.workflow.workflow 错误在 create_final_entities 中执行动词“text_embed”时出错:“NoneType”对象不可迭代

至少我嵌入成功了,但格式似乎不对

How to solve this text_embed problem. I have the same problem. And then,the whole log problem is as follows

datashaper.workflow.workflow ERROR Error executing verb "text_embed" in create_final_entities: Error code: 400 - {'object': 'error', 'message': 'NETWORK ERROR DUE TO HIGH TRAFFIC. PLEASE REGENERATE OR REFRESH THIS PAGE.\n\n(The expanded size of the tensor (513) must match the existing size (512) at non-singleton dimension 1. Target sizes: [4, 513]. Tensor sizes: [1, 512])', 'code': 50001}

yurochang commented 1 month ago

是否有人遇到过全局查询有效但本地查询无效的情况?

did you solve it?

zhangyanli-pro commented 1 month ago

Here is my final config. Somehow after VSCode crashed the summary reports started working when I started it again.

Here is my final full config that work so far:


encoding_model: cl100k_base
skip_workflows: []
llm:
  api_key: ${GRAPHRAG_API_KEY}
  type: openai_chat # or azure_openai_chat
  model: llama3
  model_supports_json: true # recommended if this is available for your model.
  max_tokens: 2000
  # request_timeout: 180.0
  api_base: http://localhost:11434/v1
  # api_version: 2024-02-15-preview
  # organization: <organization_id>
  # deployment_name: <azure_model_deployment_name>
  # tokens_per_minute: 150_000 # set a leaky bucket throttle
  # requests_per_minute: 10_000 # set a leaky bucket throttle
  max_retries: 1
  # max_retry_wait: 10.0
  # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
  concurrent_requests: 1 # the number of parallel inflight requests that may be made

parallelization:
  stagger: 0.3
  # num_threads: 50 # the number of threads to use for parallel processing

async_mode: threaded # or asyncio

embeddings:
  ## parallelization: override the global parallelization settings for embeddings
  async_mode: threaded # or asyncio
  llm:
    api_key: ${GRAPHRAG_API_KEY}
    type: openai_embedding # or azure_openai_embedding
    model: text-embedding-3-small
    api_base: https://api.openai.com/v1
    # api_version: 2024-02-15-preview
    # organization: <organization_id>
    # deployment_name: <azure_model_deployment_name>
    # tokens_per_minute: 150_000 # set a leaky bucket throttle
    # requests_per_minute: 10_000 # set a leaky bucket throttle
    max_retries: 1
    # max_retry_wait: 10.0
    # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
    concurrent_requests: 1 # the number of parallel inflight requests that may be made
    batch_size: 1 # the number of documents to send in a single request
    batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
    # target: required # or optional

chunks:
  size: 300
  overlap: 100
  group_by_columns: [id] # by default, we don't allow chunks to cross documents

input:
  type: file # or blob
  file_type: text # or csv
  base_dir: "input"
  file_encoding: utf-8
  file_pattern: ".*\\.txt$"

cache:
  type: file # or blob
  base_dir: "cache"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

storage:
  type: file # or blob
  base_dir: "output/${timestamp}/artifacts"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

reporting:
  type: file # or console, blob
  base_dir: "output/${timestamp}/reports"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

entity_extraction:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/entity_extraction.txt"
  entity_types: [organization,person,geo,event]
  max_gleanings: 0

summarize_descriptions:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/summarize_descriptions.txt"
  max_length: 500

claim_extraction:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  # enabled: true
  prompt: "prompts/claim_extraction.txt"
  description: "Any claims or facts that could be relevant to information discovery."
  max_gleanings: 0

community_report:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/community_report.txt"
  max_length: 2000
  max_input_length: 7000

cluster_graph:
  max_cluster_size: 10

embed_graph:
  enabled: false # if true, will generate node2vec embeddings for nodes
  # num_walks: 10
  # walk_length: 40
  # window_size: 2
  # iterations: 3
  # random_seed: 597832

umap:
  enabled: false # if true, will generate UMAP embeddings for nodes

snapshots:
  graphml: false
  raw_entities: false
  top_level_nodes: false

local_search:
  # text_unit_prop: 0.5
  # community_prop: 0.1
  # conversation_history_max_turns: 5
  # top_k_mapped_entities: 10
  # top_k_relationships: 10
  # max_tokens: 12000

global_search:
  # max_tokens: 12000
  # data_max_tokens: 12000
  # map_max_tokens: 1000
  # reduce_max_tokens: 2000
  # concurrency: 32

Essentially I use llama3 localy via ollama for the entities and use openai embeddings (much cheaper) until we have a solution to use ollama.

@bmaltais hi,I don't understand what value should be set for your api-key in your example. Can you tell me more about it? Thanks.

rushizirpe commented 1 month ago

If you want to use open-source models, I've put together a repository for deploying models from HuggingFace to local endpoints, having similar endpoints with compatible format as OpenAI API. Here’s the link to the repo: https://github.com/rushizirpe/open-llm-server

Also, I have created a Colab notebook (working for global as well as local search) for Graphrag: https://colab.research.google.com/drive/1uhFDnih1WKrSRQHisU-L6xw6coapgR51?usp=sharing

vivisol commented 1 month ago

I use ollama as local LLM API provider , and set chat model and embedding model api_base the same both with :http://loclahost:11434/v1. global search works ok, but local search failed with this message:

 (graphRAG) D:\Learn\GraphRAG>python -m graphrag.query --root ./newTest09 --method local "谁是叶文洁"

INFO: Reading settings from newTest09\settings.yaml
creating llm client with {'api_key': 'REDACTED,len=6', 'type': "openai_chat", 'model': 'qwen2', 'max_tokens': 4000, 'request_timeout': 180.0, 'api_base': 'http://localhost:11434/v1/', 'api_version': None, 'organization': None, 'proxy': None, 'cognitive_services_endpoint': None, 'deployment_name': None, 'model_supports_json': True, 'tokens_per_minute': 0, 'requests_per_minute': 0, 'max_retries': 10, 'max_retry_wait': 10.0, 'sleep_on_rate_limit_recommendation': True, 'concurrent_requests': 25}
creating embedding llm client with {'api_key': 'REDACTED,len=6', 'type': "openai_embedding", 'model': 'nomic-embed-text', 'max_tokens': 4000, 'request_timeout': 180.0, 'api_base': 'http://localhost:11434/v1/', 'api_version': None, 'organization': None, 'proxy': None, 'cognitive_services_endpoint': None, 'deployment_name': None, 'model_supports_json': None, 'tokens_per_minute': 0, 'requests_per_minute': 0, 'max_retries': 10, 'max_retry_wait': 10.0, 'sleep_on_rate_limit_recommendation': True, 'concurrent_requests': 25}
Error embedding chunk {'OpenAIEmbedding': "Error code: 400 - {'error': {'message': 'invalid input type', 'type': 'api_error', 'param': None, 'code': None}}"}
Traceback (most recent call last):
  File "D:\ProgramData\miniconda3\envs\graphRAG\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "D:\ProgramData\miniconda3\envs\graphRAG\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "D:\ProgramData\miniconda3\envs\graphRAG\lib\site-packages\graphrag\query\__main__.py", line 75, in <module>
    run_local_search(
  File "D:\ProgramData\miniconda3\envs\graphRAG\lib\site-packages\graphrag\query\cli.py", line 154, in run_local_search
    result = search_engine.search(query=query)
  File "D:\ProgramData\miniconda3\envs\graphRAG\lib\site-packages\graphrag\query\structured_search\local_search\search.py", line 118, in search
    context_text, context_records = self.context_builder.build_context(
  File "D:\ProgramData\miniconda3\envs\graphRAG\lib\site-packages\graphrag\query\structured_search\local_search\mixed_context.py", line 139, in build_context
    selected_entities = map_query_to_entities(
  File "D:\ProgramData\miniconda3\envs\graphRAG\lib\site-packages\graphrag\query\context_builder\entity_extraction.py", line 55, in map_query_to_entities
    search_results = text_embedding_vectorstore.similarity_search_by_text(
  File "D:\ProgramData\miniconda3\envs\graphRAG\lib\site-packages\graphrag\vector_stores\lancedb.py", line 118, in similarity_search_by_text
    query_embedding = text_embedder(text)
  File "D:\ProgramData\miniconda3\envs\graphRAG\lib\site-packages\graphrag\query\context_builder\entity_extraction.py", line 57, in <lambda>
    text_embedder=lambda t: text_embedder.embed(t),
  File "D:\ProgramData\miniconda3\envs\graphRAG\lib\site-packages\graphrag\query\llm\oai\embedding.py", line 96, in embed
    chunk_embeddings = np.average(chunk_embeddings, axis=0, weights=chunk_lens)
  File "D:\ProgramData\miniconda3\envs\graphRAG\lib\site-packages\numpy\lib\function_base.py", line 550, in average
    raise ZeroDivisionError(
ZeroDivisionError: Weights sum to zero, can't be normalized

I doubted maybe the embedding process doesn't work correct, beacause it mentioned code:400 error with OpenAIEmbedding api. But the indexing process seems working fine:

⠋ GraphRAG Indexer
├── Loading Input (text) - 1 files loaded (0 filtered) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00
├── create_base_text_units
├── create_base_extracted_entities
├── create_summarized_entities
├── create_base_entity_graph
├── create_final_entities
├── create_final_nodes
├── create_final_communities
├── join_text_units_to_entity_ids
├── create_final_relationships
├── join_text_units_to_relationship_ids
├── create_final_community_reports
├── create_final_text_units
├── create_base_documents
└── create_final_documents
🚀 All workflows completed successfully.

As my understanding, indexing process also need do embedding , how come it doesn't work when local search ? Does anyone have the same issues with GraphRAG ?

galen1980guo commented 1 month ago

I had the same problem and I was wondering if anyone had completely solved it?

$ ollama -v ollama version is 0.1.34

and my settings.yaml `encoding_model: cl100k_base skip_workflows: [] llm: api_key: ollama # ${GRAPHRAG_API_KEY} type: openai_chat # or azure_openai_chat model: llama3:70b model_supports_json: false # recommended if this is available for your model. api_base: http://10.110.0.25:11434/v1 ...

embeddings:

parallelization: override the global parallelization settings for embeddings

async_mode: threaded # or asyncio llm: api_key: ollama type: openai_embedding # or azure_openai_embedding model: nomic-embed-text:latest api_base: http://10.110.0.25:11434/api ...`

then run the command: `poetry run poe index --root .

...

❌ create_final_entities None ⠦ GraphRAG Indexer ├── Loading Input (InputFileType.text) - 1 files loaded (0 filtered) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00 ├── create_base_text_units ├── create_base_extracted_entities ├── create_summarized_entities ├── create_base_entity_graph └── create_final_entities ❌ Errors occurred during the pipeline run, see logs for more details.`

checked the error log: httpx INFO HTTP Request: POST http://10.110.0.25:11434/api/embeddings "HTTP/1.1 200 OK" graphrag.index.reporting.file_workflow_callbacks INFO Error Invoking LLM details={....} 17:19:55,244 datashaper.workflow.workflow ERROR Error executing verb "text_embed" in create_final_entities: 'NoneType' object is not iterable

rushizirpe commented 1 month ago

I had the same problem and I was wondering if anyone had completely solved it?

$ ollama -v ollama version is 0.1.34

and my settings.yaml `encoding_model: cl100k_base skip_workflows: [] llm: api_key: ollama # ${GRAPHRAG_API_KEY} type: openai_chat # or azure_openai_chat model: llama3:70b model_supports_json: false # recommended if this is available for your model. api_base: http://10.110.0.25:11434/v1 ...

embeddings:

parallelization: override the global parallelization settings for embeddings

async_mode: threaded # or asyncio llm: api_key: ollama type: openai_embedding # or azure_openai_embedding model: nomic-embed-text:latest api_base: http://10.110.0.25:11434/api ...`

then run the command: `poetry run poe index --root .

...

❌ create_final_entities None ⠦ GraphRAG Indexer ├── Loading Input (InputFileType.text) - 1 files loaded (0 filtered) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00 ├── create_base_text_units ├── create_base_extracted_entities ├── create_summarized_entities ├── create_base_entity_graph └── create_final_entities ❌ Errors occurred during the pipeline run, see logs for more details.`

checked the error log: httpx INFO HTTP Request: POST http://10.110.0.25:11434/api/embeddings "HTTP/1.1 200 OK" graphrag.index.reporting.file_workflow_callbacks INFO Error Invoking LLM details={....} 17:19:55,244 datashaper.workflow.workflow ERROR Error executing verb "text_embed" in create_final_entities: 'NoneType' object is not iterable

https://github.com/microsoft/graphrag/issues/339#issuecomment-2250094743

rushizirpe commented 1 month ago
ZeroDivisionError: Weights sum to zero, can't be normalized

The locally running embedding model in OLLAMA returns the weights in an incorrect format. OpenAI internally uses base64 encoded floats, whereas most other models return floats as numbers.

This is working: https://github.com/rushizirpe/open-llm-server I have created a Colab notebook (working for global as well as local search) for Graphrag: https://colab.research.google.com/drive/1uhFDnih1WKrSRQHisU-L6xw6coapgR51?usp=sharing

galen1980guo commented 1 month ago

I had the same problem and I was wondering if anyone had completely solved it? $ ollama -v ollama version is 0.1.34 and my settings.yaml `encoding_model: cl100k_base skip_workflows: [] llm: api_key: ollama # ${GRAPHRAG_API_KEY} type: openai_chat # or azure_openai_chat model: llama3:70b model_supports_json: false # recommended if this is available for your model. api_base: http://10.110.0.25:11434/v1 ... embeddings:

parallelization: override the global parallelization settings for embeddings

async_mode: threaded # or asyncio llm: api_key: ollama type: openai_embedding # or azure_openai_embedding model: nomic-embed-text:latest api_base: http://10.110.0.25:11434/api ... then run the command:poetry run poe index --root . ... ❌ create_final_entities None ⠦ GraphRAG Indexer ├── Loading Input (InputFileType.text) - 1 files loaded (0 filtered) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00 ├── create_base_text_units ├── create_base_extracted_entities ├── create_summarized_entities ├── create_base_entity_graph └── create_final_entities ❌ Errors occurred during the pipeline run, see logs for more details. checked the error log:httpx INFO HTTP Request: POST http://10.110.0.25:11434/api/embeddings "HTTP/1.1 200 OK" graphrag.index.reporting.file_workflow_callbacks INFO Error Invoking LLM details={....} 17:19:55,244 datashaper.workflow.workflow ERROR Error executing verb "text_embed" in create_final_entities: 'NoneType' object is not iterable`

#339 (comment)

I have read the comment above, where the embedding uses openai, but I hope it will be local. : -)

rushizirpe commented 1 month ago

I had the same problem and I was wondering if anyone had completely solved it? $ ollama -v ollama version is 0.1.34 and my settings.yaml `encoding_model: cl100k_base skip_workflows: [] llm: api_key: ollama # ${GRAPHRAG_API_KEY} type: openai_chat # or azure_openai_chat model: llama3:70b model_supports_json: false # recommended if this is available for your model. api_base: http://10.110.0.25:11434/v1 ... embeddings:

parallelization: override the global parallelization settings for embeddings

async_mode: threaded # or asyncio llm: api_key: ollama type: openai_embedding # or azure_openai_embedding model: nomic-embed-text:latest api_base: http://10.110.0.25:11434/api ...then run the command:poetry run poe index --root . ... ❌ create_final_entities None ⠦ GraphRAG Indexer ├── Loading Input (InputFileType.text) - 1 files loaded (0 filtered) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00 ├── create_base_text_units ├── create_base_extracted_entities ├── create_summarized_entities ├── create_base_entity_graph └── create_final_entities ❌ Errors occurred during the pipeline run, see logs for more details.checked the error log:httpx INFO HTTP Request: POST http://10.110.0.25:11434/api/embeddings "HTTP/1.1 200 OK" graphrag.index.reporting.file_workflow_callbacks INFO Error Invoking LLM details={....} 17:19:55,244 datashaper.workflow.workflow ERROR Error executing verb "text_embed" in create_final_entities: 'NoneType' object is not iterable`

#339 (comment)

I have read the comment above, where the embedding uses openai, but I hope it will be local. : -)

if you take a look at notebook you'll find it uses nomic-ai/nomic-embed-text-v1.5 as mentioned in .yaml config (You can use any valid model which can be loaded from huggingface). You need GROQ API key only in terms of chat completion aif you don't have higher end GPU, and if you have it then you just need to replace API endpoint as http://localhost:1234/v1 and model name (From HF) you want to use.

galen1980guo commented 1 month ago
ZeroDivisionError: Weights sum to zero, can't be normalized

The locally running embedding model in OLLAMA returns the weights in an incorrect format. OpenAI internally uses base64 encoded floats, whereas most other models return floats as numbers.

This is working: https://github.com/rushizirpe/open-llm-server I have created a Colab notebook (working for global as well as local search) for Graphrag: https://colab.research.google.com/drive/1uhFDnih1WKrSRQHisU-L6xw6coapgR51?usp=sharing

oh...are you actually replacing ollama with open llm server in this notebook?

rushizirpe commented 1 month ago
ZeroDivisionError: Weights sum to zero, can't be normalized

The locally running embedding model in OLLAMA returns the weights in an incorrect format. OpenAI internally uses base64 encoded floats, whereas most other models return floats as numbers. This is working: https://github.com/rushizirpe/open-llm-server I have created a Colab notebook (working for global as well as local search) for Graphrag: https://colab.research.google.com/drive/1uhFDnih1WKrSRQHisU-L6xw6coapgR51?usp=sharing

oh...are you actually replacing ollama with open llm server in this notebook?

Yes, hope it helps!

ZhengRui commented 2 weeks ago

I use ollama as local LLM API provider , and set chat model and embedding model api_base the same both with :http://loclahost:11434/v1. global search works ok, but local search failed with this message:

 (graphRAG) D:\Learn\GraphRAG>python -m graphrag.query --root ./newTest09 --method local "谁是叶文洁"

INFO: Reading settings from newTest09\settings.yaml
creating llm client with {'api_key': 'REDACTED,len=6', 'type': "openai_chat", 'model': 'qwen2', 'max_tokens': 4000, 'request_timeout': 180.0, 'api_base': 'http://localhost:11434/v1/', 'api_version': None, 'organization': None, 'proxy': None, 'cognitive_services_endpoint': None, 'deployment_name': None, 'model_supports_json': True, 'tokens_per_minute': 0, 'requests_per_minute': 0, 'max_retries': 10, 'max_retry_wait': 10.0, 'sleep_on_rate_limit_recommendation': True, 'concurrent_requests': 25}
creating embedding llm client with {'api_key': 'REDACTED,len=6', 'type': "openai_embedding", 'model': 'nomic-embed-text', 'max_tokens': 4000, 'request_timeout': 180.0, 'api_base': 'http://localhost:11434/v1/', 'api_version': None, 'organization': None, 'proxy': None, 'cognitive_services_endpoint': None, 'deployment_name': None, 'model_supports_json': None, 'tokens_per_minute': 0, 'requests_per_minute': 0, 'max_retries': 10, 'max_retry_wait': 10.0, 'sleep_on_rate_limit_recommendation': True, 'concurrent_requests': 25}
Error embedding chunk {'OpenAIEmbedding': "Error code: 400 - {'error': {'message': 'invalid input type', 'type': 'api_error', 'param': None, 'code': None}}"}
Traceback (most recent call last):
  File "D:\ProgramData\miniconda3\envs\graphRAG\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "D:\ProgramData\miniconda3\envs\graphRAG\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "D:\ProgramData\miniconda3\envs\graphRAG\lib\site-packages\graphrag\query\__main__.py", line 75, in <module>
    run_local_search(
  File "D:\ProgramData\miniconda3\envs\graphRAG\lib\site-packages\graphrag\query\cli.py", line 154, in run_local_search
    result = search_engine.search(query=query)
  File "D:\ProgramData\miniconda3\envs\graphRAG\lib\site-packages\graphrag\query\structured_search\local_search\search.py", line 118, in search
    context_text, context_records = self.context_builder.build_context(
  File "D:\ProgramData\miniconda3\envs\graphRAG\lib\site-packages\graphrag\query\structured_search\local_search\mixed_context.py", line 139, in build_context
    selected_entities = map_query_to_entities(
  File "D:\ProgramData\miniconda3\envs\graphRAG\lib\site-packages\graphrag\query\context_builder\entity_extraction.py", line 55, in map_query_to_entities
    search_results = text_embedding_vectorstore.similarity_search_by_text(
  File "D:\ProgramData\miniconda3\envs\graphRAG\lib\site-packages\graphrag\vector_stores\lancedb.py", line 118, in similarity_search_by_text
    query_embedding = text_embedder(text)
  File "D:\ProgramData\miniconda3\envs\graphRAG\lib\site-packages\graphrag\query\context_builder\entity_extraction.py", line 57, in <lambda>
    text_embedder=lambda t: text_embedder.embed(t),
  File "D:\ProgramData\miniconda3\envs\graphRAG\lib\site-packages\graphrag\query\llm\oai\embedding.py", line 96, in embed
    chunk_embeddings = np.average(chunk_embeddings, axis=0, weights=chunk_lens)
  File "D:\ProgramData\miniconda3\envs\graphRAG\lib\site-packages\numpy\lib\function_base.py", line 550, in average
    raise ZeroDivisionError(
ZeroDivisionError: Weights sum to zero, can't be normalized

I doubted maybe the embedding process doesn't work correct, beacause it mentioned code:400 error with OpenAIEmbedding api. But the indexing process seems working fine:

⠋ GraphRAG Indexer
├── Loading Input (text) - 1 files loaded (0 filtered) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00
├── create_base_text_units
├── create_base_extracted_entities
├── create_summarized_entities
├── create_base_entity_graph
├── create_final_entities
├── create_final_nodes
├── create_final_communities
├── join_text_units_to_entity_ids
├── create_final_relationships
├── join_text_units_to_relationship_ids
├── create_final_community_reports
├── create_final_text_units
├── create_base_documents
└── create_final_documents
🚀 All workflows completed successfully.

As my understanding, indexing process also need do embedding , how come it doesn't work when local search ? Does anyone have the same issues with GraphRAG ?

I am experiencing the same issue here, from my understanding Ollama now has oai compatible api endpoint v1/embeddings for embeddings, so indexing step works fine when using http://localhost:11434/v1 as api_base for both llm and embeddings. Global query also works. Local query may have its separate embedding logics that caused the issue, it generates lancedb files, digging into it now.

Note:

concurrent_requests, chunks.size and the specific ollama served model's num_ctx have to be set properly to pass the indexing step. In my case:

In the beginning I used llama3.1 model, its modelfile (you can check using ollama show --modelfile llama3.1 or ollama show --parameters llama3.1) does not specify num_ctx, so by default when Ollama will serve the llama3.1 model with num_ctx = 2048, in Ollama's console log where it loads the model you will find something like: image n_ctx in the image is 4 x num_ctx. Graphrag default chunks size is 1200, plus default prompts (inside promtps folder after initializing the project) are lengthy, the final prompt sent to Ollama can easily surpass num_ctx (you will find input truncated in Ollama console logs for /v1/chat/completions api calls). The result of this is llm will not follow the specified output format in returned response, you can check the generated files inside cache folder to see if it follows the output format as specified in prompts folder. In my case it does output any entity and caused this issue: https://github.com/microsoft/graphrag/issues/443#issuecomment-2248000519

The way to solve this is:

I specified 4096 as num_ctx in the new modelfile (instead of 8192, now I can see n_ctx is 16384 in ollama console log when loading the model) and kept chunks.size as 1200.

As for concurrent_requests setting, the default value 25 will cause request timed out error during indexing, ollama console log looks fine but you can find request timed out errors in indexing-engine.log and logs.json file. how Ollama handle concurrent requests. I set llm.concurrent_requests: 1 and embeddings.llm.concurrent_requests: 4.

With these settings, indexing and global query work smoothly. Local query still has issue.

ZhengRui commented 2 weeks ago

local query issue is solved by https://github.com/microsoft/graphrag/issues/451#issuecomment-2220861232 and https://github.com/microsoft/graphrag/pull/568/files , but the query result seems not so good.

Update:

  1. turns out local search result not good is because 4k context window is too small for local search prompt (it easily reaches 40k chars), after I change num_ctx from 4096 to 10240, local search result starts looking relevant and give references.
  2. using smaller chunks.size will make prompts longer, in my case global search prompt surpasses num_ctx and llm response does not return json structured response as required in the prompt. in this case increase chunks.size or num_ctx.
st-rope commented 2 weeks ago

regarding generating embeddings with ollama. I had this issue already in another project. Calling function embeddings() of ollama -> _client.py -> Client returns:

      'POST',
      '/api/embeddings',
      json={
        'model': model,
        'prompt': prompt,
        'options': options or {},
        'keep_alive': keep_alive,
      },
    ).json()

At this point '/api/embeddings/' is not replaced with the specified api_base. Replacing it manually with ollama's 'http://localhost:11434/api/embeddings/ at least makes it possible to generate embeddings with ollama