infiniflow / ragflow

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
https://ragflow.io
Apache License 2.0
17.15k stars 1.74k forks source link

[Bug]: `bad escape \P at position 374 (line 18, column 23)` when using the graph feature #1727

Open Randname666 opened 1 month ago

Randname666 commented 1 month ago

Is there an existing issue for the same bug?

Branch name

main

Commit ID

c943517

Other environment information

Hardware: VMWare Workstation 16.2.3, Intel Xeon X5690 (6 threads assigned), 24GB RAM assigned, No available GPU
OS Type: Ubuntu Server 24.04 LTS

Actual behavior

Flow of the graph crashed at node retrieval-windows and responds error bad escape \P at position 374 (line 18, column 23) after a Retrieval node. The graph is shown in the picture: 图片 Node retrieval-windows has following settings:

(other categories are omitted as those categories and related nodes are running fine.)

Name: Windows
Description: 问题中包含Windows字样或一个Windows版本。
Examples:
Windows
Windows Server
Windows 11
Windows 10
Windows 7
Windows Vista
Windows XP
XP系统
To: retrieval-windows

The generate node LovelyTipsCall is using qwen-plus model.

There is no meaningful information given in the log.

I'm really sorry, but the document to be used to generate the knowledge database for this graph, and probably the sliced content is not allowed to be distributed according to its copyright claims.

Expected behavior

Flow of the graph continues to run and reply the final result in the chat window.

Steps to reproduce

Build a similar graph and run.

Additional information

This is probably caused by some kind of strange characters in either the source document or sliced results ?

KevinHuSh commented 1 month ago

Why are there 2 'generate', if you don't mind?

guoyuhao2330 commented 1 month ago

image Sorry I can't reproduce this bug, maybe your question contains special characters that cause the bug, please provide more details, and you need to use one 'Answer' to implement the loop.

Randname666 commented 1 month ago

Why are there 2 'generate', if you don't mind?

Layout recognition feature doesn't work very well for the document using here and it scrambles the text all over the place, especially when a paragraph is composed with both Chinese and English content.

E.g. 数据库、用户移植对象之前,一般SQ应L 先迁P移L数/S据QL 库、用户。 MySQL 数据迁移支使持用在K线DT迁S 移将和M离y线S迁QL 移离

So the first generator is supposed to work as a "spell checker" to try to reorder the content it fetched from retrieval nodes. Turned out it doesn't work that way and should be removed.

hint used:

请修正以下句子的语序,不要加入句子中没有的内容。
{input}