The language setting in Segmenting in Question & Answer format don't work

langgenius / dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.

Other

49.44k stars 7.07k forks source link

Self Checks

[X] This is only for bug report, if you would like to ask a question, please head to Discussions.
[X] I have searched for existing issues search for existing issues, including closed ones.
[X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[X] [FOR CHINESE USERS] 请务必使用英文提交 Issue，否则会被关闭。谢谢！:）
[X] Please do not modify this template :) and fill in all the required fields.

Dify version

0.9.2

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

CleanShot 2024-10-18 at 15 47 19@2x

Even the language is set to Chinese Simplified, the result segment of the index is English.

CleanShot 2024-10-18 at 15 49 46@2x

✔️ Expected Behavior

No response

❌ Actual Behavior

No response

There are similar issues and discussions related to the language setting in the Question & Answer segmentation format not working as expected:

An open issue titled "Q&A format choose Chinese. Shows 0 paragraphs. Other languages work normally." suggests that the problem might be caused by the _format_split_text and format_split_text functions. If the input text contains special characters like "*", these functions may return an empty list, resulting in an empty qa_preview. Additionally, if the output is not in the expected format, the qa_preview will also be empty.
A closed issue titled "Adopt Q&A segmentation mode, segmentation use option: simplified Chinese, or a large number of English segments will be generated." indicates that testing with OpenAI GPT-3.5 worked fine. It was suggested to emphasize the use of language above the prompt and to check out the latest code for improvements.

These references might provide insights or solutions to the issue you're experiencing with the language setting in the segmentation format ^[1]^[2].

To continue talking to Dosu, mention @dosu.

langgenius / dify