langgenius / dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
https://dify.ai
Other
52.36k stars 7.63k forks source link

document extractor with unstructured io for pptx does not function as expected #10956

Open fdb02983rhy opened 22 hours ago

fdb02983rhy commented 22 hours ago

Self Checks

Dify version

0.11.2

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

Use doc extractor to process a pptx with unstructured io

✔️ Expected Behavior

Process succefully

❌ Actual Behavior


api-1`         |   File "/app/api/.venv/lib/python3.10/site-packages/gunicorn/workers/base_async.py", line 115, in handle_request
api-1         |     for item in respiter:
api-1         |   File "/app/api/.venv/lib/python3.10/site-packages/werkzeug/wsgi.py", line 256, in __next__
api-1         |     return self._next()
api-1         |   File "/app/api/.venv/lib/python3.10/site-packages/werkzeug/wrappers/response.py", line 32, in _iter_encoded
api-1         |     for item in iterable:
api-1         |   File "/app/api/.venv/lib/python3.10/site-packages/flask/helpers.py", line 113, in generator
api-1         |     yield from gen
api-1         |   File "/app/api/libs/helper.py", line 186, in generate
api-1         |     yield from response
api-1         |   File "/app/api/core/app/features/rate_limiting/rate_limit.py", line 115, in __next__
api-1         |     return next(self.generator)
api-1         |   File "/app/api/core/app/apps/base_app_generate_response_converter.py", line 25, in _generate_full_response
api-1         |     for chunk in cls.convert_stream_full_response(response):
api-1         |   File "/app/api/core/app/apps/advanced_chat/generate_response_converter.py", line 67, in convert_stream_full_response
api-1         |     for chunk in stream_response:
api-1         |   File "/app/api/core/app/apps/advanced_chat/generate_task_pipeline.py", line 187, in _to_stream_response
api-1         |     for stream_response in generator:
api-1         |   File "/app/api/core/app/apps/advanced_chat/generate_task_pipeline.py", line 218, in _wrapper_process_stream_response
api-1         |     for response in self._process_stream_response(tts_publisher=tts_publisher, trace_manager=trace_manager):
api-1         |   File "/app/api/core/app/apps/advanced_chat/generate_task_pipeline.py", line 319, in _process_stream_response
api-1         |     workflow_node_execution = self._handle_workflow_node_execution_failed(event)
api-1         |   File "/app/api/core/app/task_pipeline/workflow_cycle_manage.py", line 339, in _handle_workflow_node_execution_failed
api-1         |     WorkflowNodeExecution.process_data: json.dumps(event.process_data) if event.process_data else None,
api-1         |   File "/usr/local/lib/python3.10/json/__init__.py", line 231, in dumps
api-1         |     return _default_encoder.encode(obj)
api-1         |   File "/usr/local/lib/python3.10/json/encoder.py", line 199, in encode
api-1         |     chunks = self.iterencode(o, _one_shot=True)
api-1         |   File "/usr/local/lib/python3.10/json/encoder.py", line 257, in iterencode
api-1         |     return _iterencode(o, 0)
api-1         |   File "/app/api/.venv/lib/python3.10/site-packages/frozendict/__init__.py", line 32, in default
api-1         |     return BaseJsonEncoder.default(
api-1         |   File "/usr/local/lib/python3.10/json/encoder.py", line 179, in default
api-1         |     raise TypeError(f'Object of type {o.__class__.__name__} '
api-1         | TypeError: Object of type File is not JSON serializable
``
dosubot[bot] commented 22 hours ago

The error message "TypeError: Object of type File is not JSON serializable" indicates that there is an attempt to serialize a File object into JSON, which is not directly possible. This typically happens when a File object is included in a data structure that is being converted to JSON, such as when using json.dumps().

To resolve this issue, you need to ensure that any File objects are converted to a JSON-serializable format before attempting to serialize them. This could involve extracting relevant information from the File object, such as its path, name, or content, and including only that information in the data structure being serialized.

In the context of the Dify platform, you might need to check the part of the code where process_data is being prepared for serialization and ensure that it does not include non-serializable objects like File. Instead, convert these objects to a format that can be serialized, such as a string or dictionary containing the necessary details.

To continue talking to Dosu, mention @dosu.