Closed cht-k closed 1 week ago
Hello @cht-k! I'm here to help you with any bugs, questions, or contributions. Let's work together to solve the issue you're facing.
The error you're encountering indicates that the NLTK resource punkt_tab
is missing and needs to be downloaded. The error message provides instructions on how to download the resource using the NLTK Downloader.
To resolve this issue, you can add the following code snippet to your initialization script or setup file to ensure that the punkt_tab
resource is downloaded:
import nltk
nltk.download('punkt')
This will download the necessary punkt
tokenizer data, which includes punkt_tab
.
If you have access to the codebase, you can add this snippet to the appropriate initialization section. If not, you may need to contact the maintainers or support team to have this change implemented.
Currently, there is no specific information in the Dify codebase regarding the handling or downloading of NLTK resources, including punkt_tab
[1][2].
To continue talking to Dosu, mention @dosu.
What system processing LLM did you choose?
I'm not entirely sure what specific information you're requesting here, but I can provide some details that might be relevant:
text-embedding-3-large
.From my observation, this issue mainly affects the process when the chunk settings are set to "automatic".
It's worth noting that I'm using the Dify Cloud version, and these error messages appear to be generated by Dify Cloud's backend. Therefore, I don't believe my local system information is relevant to this issue.
Could you please clarify if you need any other specific information about the LLM or system settings? I'd be happy to provide more details if I can locate them in the Dify Cloud interface.
I am also facing the same error. I use Dify Cloud. When I create an empty knowledge database, Dify always happens the error.
I was able to avoid the punk_tab error by following these steps:
If you use HTTP API to create document, choose "custom" mode of process_rule.
This should be resolved in #7582
Self Checks
Dify version
0.7.1
Cloud or Self Hosted
Cloud
Steps to reproduce
When attempting to add additional documents to an existing Knowledge in Dify Cloud, an error occurs during the "Text Preprocessing and Cleaning" stage. The error message suggests that there's an issue with NLTK (Natural Language Toolkit) being unable to locate a resource named "punkt_tab".
Steps to Reproduce:
✔️ Expected Behavior
Ability to embed new documents and add them to an existing Knowledge base.
❌ Actual Behavior
Error message appears when entering the "Text Preprocessing and Cleaning" page.