Closed night666e closed 3 months ago
求助大佬
这个graphrag 根本就不适用 我也尝试的修改实体抽取 啥的 都有问题 而现实中不可能是按graphrag 的标准提示词来输出的 所以我感觉 想法很好 但是需要好好迭代才能使用 根本不能使用在实战中
这个graphrag 根本就不适用 我也尝试的修改实体抽取 啥的 都有问题 而现实中不可能是按graphrag 的标准提示词来输出的 所以我感觉 想法很好 但是需要好好迭代才能使用 根本不能使用在实战中
这个graphrag 根本就不适用 我也尝试的修改实体抽取 啥的 都有问题 而现实中不可能是按graphrag 的标准提示词来输出的 所以我感觉 想法很好 但是需要好好迭代才能使用 根本不能使用在实战中
适当约束一下,应该可以的吧
有解决方案么
We commonly see this failure due to incorrectly configured models, especially if running a non-OpenAI model that doesn't map correctly. Routing to #657 for community support.
我们通常会看到由于模型配置不正确而导致的失败,尤其是在运行未正确映射的非OpenAI模型时。路由到#657社区支持。
这次graphrag的更新,可以修改提示词吗
Is there an existing issue for this?
Describe the issue
我基础认为是中英文的原因尝试后都无果仍然报错,我利用的模型是xinference的模型,glm与bge模型,目前只能利用原本的提示词才可以跑通文本,更换提示词后就会出现毛病
Steps to reproduce
我先修改了entity_extraction.txt文件 然后对进行了graphrag.index的索引
GraphRAG Config Used
encoding_model: cl100k_base skip_workflows: [] llm: api_key: Xinference type: openai_chat # or azure_openai_chat model: glm4-chat-HhEBQi0N model_supports_json: true # recommended if this is available for your model.
max_tokens: 4000
request_timeout: 180.0
api_base: http://127.0.0.1:9997/v1
api_version: 2024-02-15-preview
organization:
deployment_name:
tokens_per_minute: 150_000 # set a leaky bucket throttle
requests_per_minute: 10_000 # set a leaky bucket throttle
max_retries: 10
max_retry_wait: 10.0
sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
concurrent_requests: 25 # the number of parallel inflight requests that may be made
parallelization: stagger: 0.3
num_threads: 50 # the number of threads to use for parallel processing
async_mode: threaded # or asyncio
embeddings:
parallelization: override the global parallelization settings for embeddings
async_mode: threaded # or asyncio llm: api_key: Xinference type: openai_embedding # or azure_openai_embedding model: bge-base-zh api_base: http://127.0.0.1:9997/v1
api_version: 2024-02-15-preview
chunks: size: 1000 overlap: 300 group_by_columns: [id] # by default, we don't allow chunks to cross documents
input: type: file # or blob file_type: text # or csv base_dir: "input" file_encoding: utf-8 file_pattern: ".*\.txt$"
cache: type: file # or blob base_dir: "cache_et"
connection_string:
container_name:
storage: type: file # or blob base_dir: "output/${timestamp}/artifacts"
connection_string:
container_name:
reporting: type: file # or console, blob base_dir: "output/${timestamp}/reports"
connection_string:
container_name:
entity_extraction:
llm: override the global llm settings for this task
parallelization: override the global parallelization settings for this task
async_mode: override the global async_mode settings for this task
prompt: "prompts/entity_en.txt" entity_types: [organization,person,geo,event] max_gleanings: 0
summarize_descriptions:
llm: override the global llm settings for this task
parallelization: override the global parallelization settings for this task
async_mode: override the global async_mode settings for this task
prompt: "prompt/summarize_descriptions.txt" max_length: 500
claim_extraction:
llm: override the global llm settings for this task
parallelization: override the global parallelization settings for this task
async_mode: override the global async_mode settings for this task
enabled: true
prompt: "prompt/claim_extraction.txt" description: "任何可能与信息发现相关的声明或事实。" max_gleanings: 0
community_report:
llm: override the global llm settings for this task
parallelization: override the global parallelization settings for this task
async_mode: override the global async_mode settings for this task
prompt: "prompt/community_report.txt" max_length: 2000 max_input_length: 8000
cluster_graph: max_cluster_size: 10
embed_graph: enabled: true # if true, will generate node2vec embeddings for nodes
num_walks: 10
walk_length: 40
window_size: 2
iterations: 3
random_seed: 597832
umap: enabled: false # if true, will generate UMAP embeddings for nodes
snapshots: graphml: true raw_entities: false top_level_nodes: false
local_search:
text_unit_prop: 0.5
community_prop: 0.1
conversation_history_max_turns: 5
top_k_mapped_entities: 10
top_k_relationships: 10
max_tokens: 12000
global_search:
max_tokens: 12000
data_max_tokens: 12000
map_max_tokens: 1000
reduce_max_tokens: 2000
concurrency: 32
Logs and screenshots
日志:{"type": "error", "data": "Error executing verb \"cluster_graph\" in create_base_entity_graph: Columns must be same length as key", "stack": "Traceback (most recent call last):\n File \"/home/dell/anaconda3/envs/graphrag/lib/python3.11/site-packages/datashaper/workflow/workflow.py\", line 410, in _execute_verb\n result = node.verb.func(verb_args)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/home/dell/anaconda3/envs/graphrag/lib/python3.11/site-packages/graphrag/index/verbs/graph/clustering/cluster_graph.py\", line 102, in cluster_graph\n output_df[[level_to, to]] = pd.DataFrame(\n
~~~~~^^^^^^^^^^^^^^^^\n File \"/home/dell/anaconda3/envs/graphrag/lib/python3.11/site-packages/pandas/core/frame.py\", line 4299, in setitem\n self._setitem_array(key, value)\n File \"/home/dell/anaconda3/envs/graphrag/lib/python3.11/site-packages/pandas/core/frame.py\", line 4341, in _setitem_array\n check_key_length(self.columns, key, value)\n File \"/home/dell/anaconda3/envs/graphrag/lib/python3.11/site-packages/pandas/core/indexers/utils.py\", line 390, in check_key_length\n raise ValueError(\"Columns must be same length as key\")\nValueError: Columns must be same length as key\n", "source": "Columns must be same length as key", "details": null} {"type": "error", "data": "Error running pipeline!", "stack": "Traceback (most recent call last):\n File \"/home/dell/anaconda3/envs/graphrag/lib/python3.11/site-packages/graphrag/index/run.py\", line 323, in run_pipeline\n result = await workflow.run(context, callbacks)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/home/dell/anaconda3/envs/graphrag/lib/python3.11/site-packages/datashaper/workflow/workflow.py\", line 369, in run\n timing = await self._execute_verb(node, context, callbacks)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/home/dell/anaconda3/envs/graphrag/lib/python3.11/site-packages/datashaper/workflow/workflow.py\", line 410, in _execute_verb\n result = node.verb.func(verb_args)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/home/dell/anaconda3/envs/graphrag/lib/python3.11/site-packages/graphrag/index/verbs/graph/clustering/cluster_graph.py\", line 102, in cluster_graph\n output_df[[level_to, to]] = pd.DataFrame(\n~~~~~^^^^^^^^^^^^^^^^\n File \"/home/dell/anaconda3/envs/graphrag/lib/python3.11/site-packages/pandas/core/frame.py\", line 4299, in setitem\n self._setitem_array(key, value)\n File \"/home/dell/anaconda3/envs/graphrag/lib/python3.11/site-packages/pandas/core/frame.py\", line 4341, in _setitem_array\n check_key_length(self.columns, key, value)\n File \"/home/dell/anaconda3/envs/graphrag/lib/python3.11/site-packages/pandas/core/indexers/utils.py\", line 390, in check_key_length\n raise ValueError(\"Columns must be same length as key\")\nValueError: Columns must be same length as key\n", "source": "Columns must be same length as key", "details": null}Additional Information