karthik-codex / Autogen_GraphRAG_Ollama

Microsoft's GraphRAG + AutoGen + Ollama + Chainlit = Fully Local & Free Multi-Agent RAG Superbot
518 stars 104 forks source link

It took a long time to create the knowledge graph #14

Closed Minxiangliu closed 3 months ago

Minxiangliu commented 3 months ago

Thank you for your tutorial. I understand that building a knowledge graph can take a lot of time, but I noticed that there seems to be a bottleneck during the process. The computer's CPU and GPU usage is almost idle. Is this normal? Are there any ways to improve efficiency?

Thanks in advance!

The execution time in the image is after I interrupted and re-executed it. image

image

settings.yaml


encoding_model: cl100k_base
skip_workflows: []
llm:
  api_key: ${GRAPHRAG_API_KEY}
  type: openai_chat # or azure_openai_chat
  model: qwen2-7b-instruct-q5_k_m
  model_supports_json: true # recommended if this is available for your model.
  max_tokens: 4000
  request_timeout: 210.0
  # api_base: https://<instance>.openai.azure.com
  api_base: http://localhost:11434/v1
  # api_version: 2024-02-15-preview
  # organization: <organization_id>
  # deployment_name: <azure_model_deployment_name>
  # tokens_per_minute: 150_000 # set a leaky bucket throttle
  # requests_per_minute: 10_000 # set a leaky bucket throttle
  # max_retries: 10
  # max_retry_wait: 10.0
  # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
  # concurrent_requests: 25 # the number of parallel inflight requests that may be made

parallelization:
  stagger: 0.3
  # num_threads: 50 # the number of threads to use for parallel processing

async_mode: threaded # or asyncio

embeddings:
  ## parallelization: override the global parallelization settings for embeddings
  async_mode: threaded # or asyncio
  llm:
    api_key: ${GRAPHRAG_API_KEY}
    type: openai_embedding # or azure_openai_embedding
    # model: text-embedding-3-small
    model: qwen2-7b-instruct-q5_k_m
    api_base: http://localhost:11434/api
    # api_base: https://<instance>.openai.azure.com
    # api_version: 2024-02-15-preview
    # organization: <organization_id>
    # deployment_name: <azure_model_deployment_name>
    # tokens_per_minute: 150_000 # set a leaky bucket throttle
    # requests_per_minute: 10_000 # set a leaky bucket throttle
    # max_retries: 10
    # max_retry_wait: 10.0
    # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
    # concurrent_requests: 25 # the number of parallel inflight requests that may be made
    # batch_size: 16 # the number of documents to send in a single request
    # batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
    # target: required # or optional

chunks:
  size: 300
  overlap: 100
  group_by_columns: [id] # by default, we don't allow chunks to cross documents

input:
  type: file # or blob
  file_type: text # or csv
  base_dir: "input"
  file_encoding: utf-8
  file_pattern: ".*\\.md$"

cache:
  type: file # or blob
  base_dir: "cache"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

storage:
  type: file # or blob
  base_dir: "output/${timestamp}/artifacts"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

reporting:
  type: file # or console, blob
  base_dir: "output/${timestamp}/reports"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

entity_extraction:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/entity_extraction.txt"
  entity_types: [organization,person,geo,event]
  max_gleanings: 0

summarize_descriptions:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/summarize_descriptions.txt"
  max_length: 500

claim_extraction:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  # enabled: true
  prompt: "prompts/claim_extraction.txt"
  description: "Any claims or facts that could be relevant to information discovery."
  max_gleanings: 0

community_report:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/community_report.txt"
  max_length: 2000
  max_input_length: 8000

cluster_graph:
  max_cluster_size: 10

embed_graph:
  enabled: false # if true, will generate node2vec embeddings for nodes
  # num_walks: 10
  # walk_length: 40
  # window_size: 2
  # iterations: 3
  # random_seed: 597832

umap:
  enabled: false # if true, will generate UMAP embeddings for nodes

snapshots:
  graphml: false
  raw_entities: false
  top_level_nodes: false

local_search:
  # text_unit_prop: 0.5
  # community_prop: 0.1
  # conversation_history_max_turns: 5
  # top_k_mapped_entities: 10
  # top_k_relationships: 10
  # max_tokens: 12000

global_search:
  # max_tokens: 12000
  # data_max_tokens: 12000
  # map_max_tokens: 1000
  # reduce_max_tokens: 2000
  # concurrency: 32
fishfree commented 3 months ago

@Minxiangliu Does it relate to the tokenization mechanism?

Minxiangliu commented 3 months ago

Hi @fishfree , Thank you for your comment, but I'm not sure if this is the issue. I noticed in the ollama serve log that there's a HTTP/1.1 503 Service Unavailable error—perhaps it's due to an overwhelming number of server requests? However, I'm unsure how to configure it.

Is this the correct approach?

export OLLAMA_MAX_QUEUE=1024

Interestingly, after restarting the computer and the container, I was able to build the knowledge graph (KG) again. Upon closer inspection, I also noticed some 500 errors in the ollama serve log. Do you have any insights on this?

[GIN] 2024/08/12 - 02:07:06 | 200 |         2m30s |       127.0.0.1 | POST     "/v1/chat/completions"
[GIN] 2024/08/12 - 02:07:08 | 200 |         2m32s |       127.0.0.1 | POST     "/v1/chat/completions"
[GIN] 2024/08/12 - 02:07:32 | 200 |         2m56s |       127.0.0.1 | POST     "/v1/chat/completions"
[GIN] 2024/08/12 - 02:07:37 | 200 |          3m2s |       127.0.0.1 | POST     "/v1/chat/completions"
[GIN] 2024/08/12 - 02:07:40 | 200 |          3m4s |       127.0.0.1 | POST     "/v1/chat/completions"
[GIN] 2024/08/12 - 02:07:50 | 200 |         3m14s |       127.0.0.1 | POST     "/v1/chat/completions"
[GIN] 2024/08/12 - 02:08:05 | 500 |         3m30s |       127.0.0.1 | POST     "/v1/chat/completions"
time=2024-08-12T02:08:05.722Z level=ERROR source=server.go:719 msg="Failed to acquire semaphore" error="context canceled"
[GIN] 2024/08/12 - 02:08:05 | 500 |         3m30s |       127.0.0.1 | POST     "/v1/chat/completions"
time=2024-08-12T02:08:05.723Z level=ERROR source=server.go:719 msg="Failed to acquire semaphore" error="context canceled"
[GIN] 2024/08/12 - 02:08:05 | 500 |         3m30s |       127.0.0.1 | POST     "/v1/chat/completions"
[GIN] 2024/08/12 - 02:08:05 | 500 |         3m30s |       127.0.0.1 | POST     "/v1/chat/completions"
[GIN] 2024/08/12 - 02:08:05 | 500 |         3m30s |       127.0.0.1 | POST     "/v1/chat/completions"
[GIN] 2024/08/12 - 02:08:05 | 500 |         3m30s |       127.0.0.1 | POST     "/v1/chat/completions"
time=2024-08-12T02:08:05.722Z level=ERROR source=server.go:719 msg="Failed to acquire semaphore" error="context canceled"
[GIN] 2024/08/12 - 02:08:05 | 500 |         3m30s |       127.0.0.1 | POST     "/v1/chat/completions"
[GIN] 2024/08/12 - 02:08:05 | 500 |         3m30s |       127.0.0.1 | POST     "/v1/chat/completions"
time=2024-08-12T02:08:05.723Z level=ERROR source=server.go:719 msg="Failed to acquire semaphore" error="context canceled"
[GIN] 2024/08/12 - 02:08:05 | 500 |         3m30s |       127.0.0.1 | POST     "/v1/chat/completions"
[GIN] 2024/08/12 - 02:08:29 | 500 |         3m30s |       127.0.0.1 | POST     "/v1/chat/completions"
[GIN] 2024/08/12 - 02:08:35 | 200 |          3m9s |       127.0.0.1 | POST     "/v1/chat/completions"
[GIN] 2024/08/12 - 02:08:43 | 500 |         3m30s |       127.0.0.1 | POST     "/v1/chat/completions"
[GIN] 2024/08/12 - 02:08:48 | 500 |         3m30s |       127.0.0.1 | POST     "/v1/chat/completions"
[GIN] 2024/08/12 - 02:09:09 | 200 |         3m21s |       127.0.0.1 | POST     "/v1/chat/completions"
[GIN] 2024/08/12 - 02:09:14 | 200 |         3m21s |       127.0.0.1 | POST     "/v1/chat/completions"
[GIN] 2024/08/12 - 02:09:23 | 200 |         3m23s |       127.0.0.1 | POST     "/v1/chat/completions"
[GIN] 2024/08/12 - 02:09:32 | 200 |          3m2s |       127.0.0.1 | POST     "/v1/chat/completions"
[GIN] 2024/08/12 - 02:09:53 | 200 |         1m46s |       127.0.0.1 | POST     "/v1/chat/completions"
[GIN] 2024/08/12 - 02:09:54 | 200 |         1m47s |       127.0.0.1 | POST     "/v1/chat/completions"
[GIN] 2024/08/12 - 02:10:05 | 200 |         1m58s |       127.0.0.1 | POST     "/v1/chat/completions"
[GIN] 2024/08/12 - 02:10:16 | 200 |          2m9s |       127.0.0.1 | POST     "/v1/chat/completions"
fishfree commented 3 months ago

@Minxiangliu Sorry to reply late. Did you solve your problem? If so, could you pls share the solution?

Minxiangliu commented 3 months ago

@fishfree Unfortunately, no. I can successfully build the knowledge graph using a small amount of document content. Additionally, I can also build the knowledge graph successfully with other models(Breeze-7B-Instruct-v1_0). The dataset for this model includes many documents with the same font as mine.