langgenius / dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
https://dify.ai
Other
47.41k stars 6.76k forks source link

[BUG] Batch import failed #2300

Closed luzhouuu closed 8 months ago

luzhouuu commented 8 months ago

Self Checks

Dify version

0.5.2

Cloud or Self Hosted

Cloud

Steps to reproduce

When I try to batch Import annotations by csv file, it failed. I tired in the cloud and self hosted (Source), the log in self-hosted is:

DeprecationWarning: SelectableGroups dict interface is deprecated. Use select.
  for ep in entry_points().get(namespace, []):
image

✔️ Expected Behavior

The annotation was uploaded

❌ Actual Behavior

Upload Failed

dosubot[bot] commented 8 months ago

🤖

Hey there @luzhouuu! Good to see you again. Hope your coding journey is going well. 😊

Thank you for reporting this issue. The error message you're seeing is a DeprecationWarning, which is typically not the cause of a program failure but rather a warning that a certain Python feature is being phased out.

The warning message suggests that the SelectableGroups dict interface is deprecated and the select method should be used instead. This is likely coming from a third-party library that dify is using, and not from dify itself.

As for the batch import failure, it's handled by the batch_import_annotations_task function in the dify codebase. This function is defined in the file api/tasks/annotation/batch_import_annotations_task.py. If there's an exception at any point during the batch import process, it rolls back the database session, sets the status of the indexing process in the Redis cache to 'error', logs the error message in the Redis cache, and logs the exception.

The expected format of the CSV file for batch import of annotations in dify is a two-column format where the first column represents the question and the second column represents the answer. The first row should contain the headers, which are 'question' and 'answer' for English language and '问题' and '答案' for Chinese language. The subsequent rows should contain the question and answer pairs.

If the CSV file you're trying to import matches this format and you're still experiencing issues, it would be helpful if you could provide more information about the CSV file and the exact steps you're taking to import it. This will help us better understand the issue and find a solution.

For more information, you can refer to the following sources:

I hope this helps! If you have any other questions or need further clarification, please let me know.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

JohnJyong commented 8 months ago

If possible, please provide your test files. We cannot reproduce the problem you mentioned. @luzhouuu

luzhouuu commented 8 months ago

@JohnJyong Thanks for your reply, the upload failure occurred because I attempted to upload more than 5000 annotations at once.