Closed MitraSafarinejad closed 2 months ago
🤖
سلام دوباره, @MitraSafarinejad! خوشحالم که دوباره شما را اینجا میبینم. امیدوارم حالتون خوب باشه.
Based on the information provided in the repository, it appears that Dify currently does not support languages other than English or Chinese for extracting keywords from documents. This is indicated in the api/constants/languages.py file, where Farsi (or Persian) is not listed among the supported languages.
The keyword extraction process in Dify seems to be handled by the Jieba library, as suggested by the presence of a file named stopwords.py
within a directory path that includes keyword/jieba
. This file is likely responsible for defining stopwords, which are commonly filtered out during the keyword extraction process. However, the Jieba library is primarily designed for processing Chinese text. You can find more details in the api/core/rag/datasource/keyword/jieba/stopwords.py file.
Unfortunately, without support for Farsi in the current implementation, keywords from Farsi documents cannot be extracted. If you have any further questions or need additional clarification, please feel free to ask.
Unfortunately, Jieba currently does not support Persian word segmentation. I think you can try the high-quality mode in the Knowledge module . It supports vector similarity matching, full-text retrieval and Rerank, which will make your retrieval more accurate. Of course, we will also introduce ES and support Persian word segmentation in the future, so stay tuned. @MitraSafarinejad
Hi, @MitraSafarinejad,
I'm helping the team manage their backlog and am marking this issue as stale. From what I understand, you reported that Dify does not extract keywords from Farsi documents unless there are English words or numbers present. It was mentioned that Dify currently does not support Farsi for keyword extraction, and it was suggested to use the high-quality mode in the Knowledge module as an alternative.
Could you please confirm if this issue is still relevant to the latest version of the repository? If it is, please let the team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days. Thank you!
hello, this issue is still there. Dify does not support Farsi for keyword extraction, and I used the high-quality mode in the Knowledge module as an alternative but it did not solve my problem. I'd appreciate it if you could support the Farsi language
Hi there, we will try to fix this.
Self Checks
Provide a description of the help you need
hello, I work on Farsi documents and after I upload my documents, I check the document and at the bottom of each chunk I can not see any keywords (see the picture below) unless I have some English words or numbers in my documents. Does Dify support any language other than English or Chinese for extracting keywords?