microsoft / graphrag

A modular graph-based Retrieval-Augmented Generation (RAG) system
https://microsoft.github.io/graphrag/
MIT License
13.49k stars 1.16k forks source link

[Bug]: Local Search responds me nothing and not report error #491

Closed goodmaney closed 1 week ago

goodmaney commented 2 weeks ago

Describe the bug

Global search works well. Local search not report error but respond null. I use xinference load llm and embedding,the embedding is working when execute Local Search

Steps to reproduce

my test file content

Snipaste_2024-07-10_23-55-46 Snipaste_2024-07-10_23-56-01

my prompt

Snipaste_2024-07-10_23-56-13

the embedding running status

Snipaste_2024-07-10_23-56-30

the final response

Snipaste_2024-07-10_23-57-12

Expected Behavior

No response

GraphRAG Config Used

encoding_model: cl100k_base skip_workflows: [] llm: api_key: ${GRAPHRAG_API_KEY} type: openai_chat # or azure_openai_chat model: glm4-chat-test model_supports_json: true[or fales]

api_base: http://127.0.0.1:9997/v1

parallelization: stagger: 0.3

async_mode: threaded # or asyncio

embeddings:

async_mode: threaded # or asyncio llm:

api_key: ${GRAPHRAG_API_KEY}
type: openai_embedding # or azure_openai_embedding
model: bce-embedding-basev1
api_base: http://127.0.0.1:9998/v1

chunks: size: 300 overlap: 100 group_by_columns: [id] # by default, we don't allow chunks to cross documents

input: type: file # or blob file_type: text # or csv base_dir: "input" file_encoding: utf-8 file_pattern: ".*\.txt$"

cache: type: file # or blob base_dir: "cache"

storage: type: file # or blob base_dir: "output/${timestamp}/artifacts"

reporting: type: file # or console, blob base_dir: "output/${timestamp}/reports"

entity_extraction:

prompt: "prompts/entity_extraction.txt" entity_types: [organization,person,geo,event] max_gleanings: 0

summarize_descriptions:

prompt: "prompts/summarize_descriptions.txt" max_length: 500

claim_extraction:

prompt: "prompts/claim_extraction.txt" description: "Any claims or facts that could be relevant to information discovery." max_gleanings: 0

community_reports:

prompt: "prompts/community_report.txt" max_length: 2000 max_input_length: 8000

cluster_graph: max_cluster_size: 10

embed_graph: enabled: false # if true, will generate node2vec embeddings for nodes

umap: enabled: false # if true, will generate UMAP embeddings for nodes

snapshots: graphml: false raw_entities: false top_level_nodes: false

local_search:

global_search:

Logs and screenshots

some indexing-engine.log

File "/home/xx/anaconda3/envs/graphrag/lib/python3.11/json/decoder.py", line 355, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 2 column 1 (char 1) 00:12:44,69 graphrag.index.reporting.file_workflow_callbacks INFO Community Report Extraction Error details=None

logs.json

{"type": "error", "data": "Community Report Extraction Error", "stack": "Traceback (most recent call last):\n File \"/home/xx/graphrag/graphrag/index/graph/extractors/community_reports/community_reports_extractor.py\", line 58, in call\n await self._llm(\n File \"/home/xx/graphrag/graphrag/llm/openai/json_parsing_llm.py\", line 34, in call\n result = await self._delegate(input, kwargs)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/home/xx/graphrag/graphrag/llm/openai/openai_token_replacing_llm.py\", line 37, in call\n return await self._delegate(input, kwargs)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/home/xx/graphrag/graphrag/llm/openai/openai_history_tracking_llm.py\", line 33, in call\n output = await self._delegate(input, kwargs)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/home/xx/graphrag/graphrag/llm/base/caching_llm.py\", line 104, in call\n result = await self._delegate(input, kwargs)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/home/xx/graphrag/graphrag/llm/base/rate_limiting_llm.py\", line 177, in call\n result, start = await execute_with_retry()\n ^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/home/xx/graphrag/graphrag/llm/base/rate_limiting_llm.py\", line 159, in execute_with_retry\n async for attempt in retryer:\n File \"/home/xx/anaconda3/envs/graphrag/lib/python3.11/site-packages/tenacity/asyncio/init.py\", line 166, in anext\n do = await self.iter(retry_state=self._retry_state)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/home/xx/anaconda3/envs/graphrag/lib/python3.11/site-packages/tenacity/asyncio/init.py\", line 153, in iter\n result = await action(retry_state)\n ^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/home/xx/anaconda3/envs/graphrag/lib/python3.11/site-packages/tenacity/_utils.py\", line 99, in inner\n return call(*args, kwargs)\n ^^^^^^^^^^^^^^^^^^^^^\n File \"/home/xx/anaconda3/envs/graphrag/lib/python3.11/site-packages/tenacity/init.py\", line 398, in \n self._add_action_func(lambda rs: rs.outcome.result())\n ^^^^^^^^^^^^^^^^^^^\n File \"/home/xx/anaconda3/envs/graphrag/lib/python3.11/concurrent/futures/_base.py\", line 449, in result\n return self.get_result()\n ^^^^^^^^^^^^^^^^^^^\n File \"/home/xx/anaconda3/envs/graphrag/lib/python3.11/concurrent/futures/_base.py\", line 401, in get_result\n raise self._exception\n File \"/home/xx/graphrag/graphrag/llm/base/rate_limiting_llm.py\", line 165, in execute_with_retry\n return await do_attempt(), start\n ^^^^^^^^^^^^^^^^^^\n File \"/home/xx/graphrag/graphrag/llm/base/rate_limiting_llm.py\", line 147, in do_attempt\n return await self._delegate(input, kwargs)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/home/xx/graphrag/graphrag/llm/base/base_llm.py\", line 48, in call\n return await self._invoke_json(input, kwargs)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/home/xx/graphrag/graphrag/llm/openai/openai_chat_llm.py\", line 82, in _invoke_json\n result = await generate()\n ^^^^^^^^^^^^^^^^\n File \"/home/xx/graphrag/graphrag/llm/openai/openai_chat_llm.py\", line 74, in generate\n await self._native_json(input, {**kwargs, \"name\": call_name})\n File \"/home/xx/graphrag/graphrag/llm/openai/openai_chat_llm.py\", line 108, in _native_json\n json_output = try_parse_json_object(raw_output)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/home/xx/graphrag/graphrag/llm/openai/utils.py\", line 93, in try_parse_json_object\n result = json.loads(input)\n ^^^^^^^^^^^^^^^^^\n File \"/home/xx/anaconda3/envs/graphrag/lib/python3.11/json/init.py\", line 346, in loads\n return _default_decoder.decode(s)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/home/xx/anaconda3/envs/graphrag/lib/python3.11/json/decoder.py\", line 337, in decode\n obj, end = self.raw_decode(s, idx=_w(s, 0).end())\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/home/xx/anaconda3/envs/graphrag/lib/python3.11/json/decoder.py\", line 355, in raw_decode\n raise JSONDecodeError(\"Expecting value\", s, err.value) from None\njson.decoder.JSONDecodeError: Expecting value: line 2 column 1 (char 1)\n", "source": "Expecting value: line 2 column 1 (char 1)", "details": null}

Additional Information

KylinMountain commented 2 weeks ago

same here, I am using gemma 9b, it only have 8k context window. I set local search with 5000 max tokens, and it is back to normal. Otherwise it will reports over_capacity in silent, you can see nothing.

local_search:
   max_tokens: 5000
ChatCompletionChunk(id='chatcmpl-82228b8b-8279-44a5-bb8f-0f14c57ab4dd', choices=[Choice(delta=ChoiceDelta(content='', function_call=None, role='assistant', tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1720694560, model='gemma2-9b-it', object='chat.completion.chunk', service_tier=None, system_fingerprint=None, usage=None, x_groq={'id': 'req_01j2gp6nqvf5zsbbszhywpceqv'})
ChatCompletionChunk(id='chatcmpl-82228b8b-8279-44a5-bb8f-0f14c57ab4dd', choices=[Choice(delta=ChoiceDelta(content=None, function_call=None, role=None, tool_calls=None), finish_reason='stop', index=0, logprobs=None)], created=1720694560, model='gemma2-9b-it', object='chat.completion.chunk', service_tier=None, system_fingerprint=None, usage=None, x_groq={'id': 'req_01j2gp6nqvf5zsbbszhywpceqv', 'error': 'over_capacity'})
goodmaney commented 2 weeks ago

same here, I am using gemma 9b, it only have 8k context window. I set local search with 5000 max tokens, and it is back to normal. Otherwise it will reports over_capacity in silent, you can see nothing.

local_search:
   max_tokens: 5000
ChatCompletionChunk(id='chatcmpl-82228b8b-8279-44a5-bb8f-0f14c57ab4dd', choices=[Choice(delta=ChoiceDelta(content='', function_call=None, role='assistant', tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1720694560, model='gemma2-9b-it', object='chat.completion.chunk', service_tier=None, system_fingerprint=None, usage=None, x_groq={'id': 'req_01j2gp6nqvf5zsbbszhywpceqv'})
ChatCompletionChunk(id='chatcmpl-82228b8b-8279-44a5-bb8f-0f14c57ab4dd', choices=[Choice(delta=ChoiceDelta(content=None, function_call=None, role=None, tool_calls=None), finish_reason='stop', index=0, logprobs=None)], created=1720694560, model='gemma2-9b-it', object='chat.completion.chunk', service_tier=None, system_fingerprint=None, usage=None, x_groq={'id': 'req_01j2gp6nqvf5zsbbszhywpceqv', 'error': 'over_capacity'})

I set 5000 it not work. but 4200 works ,and it look like the maximum. Glm4 is 128k context window. I dont know if the max_tokens is relate to LLM in local search. What is your embedding?

natoverse commented 1 week ago

Consolidating alternate model issues here: #657

kakalong136 commented 14 hours ago

same here, I am using gemma 9b, it only have 8k context window. I set local search with 5000 max tokens, and it is back to normal. Otherwise it will reports over_capacity in silent, you can see nothing.

local_search:
   max_tokens: 5000
ChatCompletionChunk(id='chatcmpl-82228b8b-8279-44a5-bb8f-0f14c57ab4dd', choices=[Choice(delta=ChoiceDelta(content='', function_call=None, role='assistant', tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1720694560, model='gemma2-9b-it', object='chat.completion.chunk', service_tier=None, system_fingerprint=None, usage=None, x_groq={'id': 'req_01j2gp6nqvf5zsbbszhywpceqv'})
ChatCompletionChunk(id='chatcmpl-82228b8b-8279-44a5-bb8f-0f14c57ab4dd', choices=[Choice(delta=ChoiceDelta(content=None, function_call=None, role=None, tool_calls=None), finish_reason='stop', index=0, logprobs=None)], created=1720694560, model='gemma2-9b-it', object='chat.completion.chunk', service_tier=None, system_fingerprint=None, usage=None, x_groq={'id': 'req_01j2gp6nqvf5zsbbszhywpceqv', 'error': 'over_capacity'})

I set 5000 it not work. but 4200 works ,and it look like the maximum. Glm4 is 128k context window. I dont know if the max_tokens is relate to LLM in local search. What is your embedding?

thanks!!!大佬!!!好人一生平安!!!!