Doc Extractor only extracts first sheet from XLS

langgenius / dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.

Other

50.17k stars 7.19k forks source link

Self Checks

[X] This is only for bug report, if you would like to ask a question, please head to Discussions.
[X] I have searched for existing issues search for existing issues, including closed ones.
[X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[X] [FOR CHINESE USERS] 请务必使用英文提交 Issue，否则会被关闭。谢谢！:）
[X] Please do not modify this template :) and fill in all the required fields.

Dify version

0.10.0

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

Upload XLS file with multiple sheet tabs in it and run Doc Extactor.

✔️ Expected Behavior

Doc Extractor should extract all sheets (tabs) within the XLS.

❌ Actual Behavior

Doc Extractor only extracts the first sheet (tab) in the XLS file.

The issue you're experiencing with the Doc Extractor only extracting the first sheet from an XLS file might be related to how the extraction process is implemented. The current implementation of the Doc Extractor uses the pandas library with the xlrd engine to read XLS files. It iterates over each sheet in the file, so it should theoretically handle multiple sheets ^[1].

However, if only the first sheet is being extracted, it could be due to a bug or a specific condition in your setup. There is error handling in place that raises a TextExtractionError if an exception occurs during the extraction process, which might provide insights if there are any underlying issues ^[2].

To further investigate, you might want to check the logs for any errors or exceptions that occur during the extraction process. Additionally, reviewing the code to ensure that the iteration over sheets is correctly implemented and not inadvertently skipped could be helpful.

To continue talking to Dosu, mention @dosu.

langgenius / dify