langgenius / dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
https://dify.ai
Other
44.92k stars 6.31k forks source link

Import pdf failed when using HUAWEI,Kunpeng 920 CPU and aarch64 Linux #7370

Open linuxliker opened 3 weeks ago

linuxliker commented 3 weeks ago

Self Checks

Dify version

Version 0.6.16

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

Import word doc is OK and quick. But when I import pdf failed, cannot preview the pdf split chunks. How to fixed it , Help needed, Thank.

using HUAWEI,Kunpeng 920 CPU and aarch64 Linux .

uname -a
Linux localhost.localdomain 4.19.90-52.39.v2207.ky10.aarch64 #4 SMP Wed Jun 5 15:52:50 CST 2024 aarch64 aarch64 aarch64 GNU/Linux
[root@localhost ~]# cat /proc/cpuinfo 
processor       : 0
model name      : HUAWEI,Kunpeng 920
BogoMIPS        : 200.00
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm ssbs
CPU implementer : 0x48
CPU architecture: 8
CPU variant     : 0x1
CPU part        : 0xd01
CPU revision    : 0

processor       : 1
model name      : HUAWEI,Kunpeng 920
BogoMIPS        : 200.00
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm ssbs
CPU implementer : 0x48
CPU architecture: 8
CPU variant     : 0x1
CPU part        : 0xd01
CPU revision    : 0
image

✔️ Expected Behavior

Import pdf ok, and preview and embedding ok.

❌ Actual Behavior

Import pdf failed. Cannot preview and embedding is in processing all the time

image
dosubot[bot] commented 3 weeks ago

@linuxliker 你好! 我是Dosu,一个可以帮助你解决 bug、回答问题并成为贡献者的机器人。我会在等待人类维护者的过程中帮助你。

There are no specific known issues or limitations documented for running Dify on aarch64 architecture, specifically with HUAWEI Kunpeng 920 CPUs [1]. However, here are some steps you can take to troubleshoot and potentially resolve the issue:

  1. Check Logs: Review the logs for any error messages related to PDF processing. This can provide insights into what might be going wrong. Look for logs in the embedding process and any warnings or errors that might be generated during the import process [[2]](https://github.com/langgenius/dify/blob/main/third-party/chrome plug-in/content.js)[3].

  2. Environment Variables: Adjust the environment variables to optimize performance. For example, you can enable Unstructured ETL by setting the appropriate environment variables. This might offer better performance or different capabilities for handling PDFs [4][5].

  3. Dependencies: Ensure that all dependencies, especially those related to PDF processing like sharp, are correctly installed and compatible with your architecture [1].

  4. Debug Mode: Enable debug mode by setting the DEBUG environment variable to true. This can help in identifying issues during the import process [5].

  5. Update Dify: Make sure you are using the latest version of Dify, as updates may include fixes and improvements for PDF processing.

If the issue persists, you might need to investigate further into the compatibility of the sharp library or other dependencies with the aarch64 architecture.

To continue talking to Dosu, mention @dosu.

linuxliker commented 3 weeks ago

1、Check Logs: When Import Pdf, /console/api/workspaces/current/default-model?model_type=rerank and /console/api/datasets/indexing-estimate return "502 Bad Gateway"。 But it is ok when I import word document using the same dify env. image

image

crazywoola commented 3 weeks ago

Please provider the container logs of the worker and api.

linuxliker commented 3 weeks ago

Please provider the container logs of the worker and api.

docker logs -f 2cfa9aae4768

Running migrations
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
INFO:matplotlib.font_manager:generated new fontManager
Preparing database migration...
Start database migration.
INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
Database migration successful!
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
[2024-08-17 08:41:11 +0000] [1] [INFO] Starting gunicorn 22.0.0
[2024-08-17 08:41:11 +0000] [1] [INFO] Listening at: http://0.0.0.0:5001 (1)
[2024-08-17 08:41:11 +0000] [1] [INFO] Using worker: gevent
[2024-08-17 08:41:11 +0000] [158] [INFO] Booting worker with pid: 158
[FATAL:partition_root.cc(863)] Check failed: (internal::SystemPageSize() == (size_t{1} << 12)) || (internal::SystemPageSize() == (size_t{1} << 14)). 
[2024-08-17 08:48:44 +0000] [1] [ERROR] Worker (pid:156) exited with code 1
[2024-08-17 08:48:44 +0000] [1] [ERROR] Worker (pid:156) exited with code 1.
[2024-08-17 08:48:44 +0000] [1] [ERROR] Worker (pid:158) was sent code 133!
[2024-08-17 08:48:44 +0000] [161] [INFO] Booting worker with pid: 161
[FATAL:partition_root.cc(863)] Check failed: (internal::SystemPageSize() == (size_t{1} << 12)) || (internal::SystemPageSize() == (size_t{1} << 14)). 
[2024-08-17 09:05:08 +0000] [1] [ERROR] Worker (pid:161) was sent code 133!
[2024-08-17 09:05:08 +0000] [164] [INFO] Booting worker with pid: 164

### When I importe pdf, this log show up:

[FATAL:partition_root.cc(863)] Check failed: (internal::SystemPageSize() == (size_t{1} << 12)) || (internal::SystemPageSize() == (size_t{1} << 14)). 
[2024-08-19 02:28:31 +0000] [1] [ERROR] Worker (pid:164) was sent code 133!
[2024-08-19 02:28:31 +0000] [166] [INFO] Booting worker with pid: 166
docker logs -f 2183491c0c0c
Running migrations
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
INFO:matplotlib.font_manager:generated new fontManager
Preparing database migration...
Start database migration.
INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
INFO  [alembic.runtime.migration] Running upgrade eeb2e349e6ac -> 1787fbae959a, update tools original_url length
INFO  [alembic.runtime.migration] Running upgrade 1787fbae959a -> 63a83fcf12ba, support conversation variables
Database migration successful!
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
/app/api/.venv/lib/python3.10/site-packages/celery/platforms.py:829: SecurityWarning: You're running the worker with superuser privileges: this is
absolutely not recommended!

Please specify a different user using the --uid option.

User information: uid=0 euid=0 gid=0 egid=0

  warnings.warn(SecurityWarning(ROOT_DISCOURAGED.format(

 -------------- celery@2183491c0c0c v5.3.6 (emerald-rush)
--- ***** ----- 
-- ******* ---- Linux-4.19.90-52.39.v2207.ky10.aarch64-aarch64-with-glibc2.39 2024-08-17 08:41:11
- *** --- * --- 
- ** ---------- [config]
- ** ---------- .> app:         app:0xfffb3cd97250
- ** ---------- .> transport:   redis://:**@redis:6379/1
- ** ---------- .> results:     postgresql://postgres:**@db:5432/dify
- *** --- * --- .> concurrency: 1 (gevent)
-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)
--- ***** ----- 
 -------------- [queues]
                .> app_deletion     exchange=app_deletion(direct) key=app_deletion
                .> dataset          exchange=dataset(direct) key=dataset
                .> generation       exchange=generation(direct) key=generation
                .> mail             exchange=mail(direct) key=mail
                .> ops_trace        exchange=ops_trace(direct) key=ops_trace

[tasks]
  . schedule.clean_embedding_cache_task.clean_embedding_cache_task
  . schedule.clean_unused_datasets_task.clean_unused_datasets_task
  . tasks.add_document_to_index_task.add_document_to_index_task
  . tasks.annotation.add_annotation_to_index_task.add_annotation_to_index_task
  . tasks.annotation.batch_import_annotations_task.batch_import_annotations_task
  . tasks.annotation.delete_annotation_index_task.delete_annotation_index_task
  . tasks.annotation.disable_annotation_reply_task.disable_annotation_reply_task
  . tasks.annotation.enable_annotation_reply_task.enable_annotation_reply_task
  . tasks.annotation.update_annotation_to_index_task.update_annotation_to_index_task
  . tasks.batch_create_segment_to_index_task.batch_create_segment_to_index_task
  . tasks.clean_dataset_task.clean_dataset_task
  . tasks.clean_document_task.clean_document_task
  . tasks.clean_notion_document_task.clean_notion_document_task
  . tasks.deal_dataset_vector_index_task.deal_dataset_vector_index_task
  . tasks.delete_segment_from_index_task.delete_segment_from_index_task
  . tasks.disable_segment_from_index_task.disable_segment_from_index_task
  . tasks.document_indexing_sync_task.document_indexing_sync_task
  . tasks.document_indexing_task.document_indexing_task
  . tasks.document_indexing_update_task.document_indexing_update_task
  . tasks.duplicate_document_indexing_task.duplicate_document_indexing_task
  . tasks.enable_segment_to_index_task.enable_segment_to_index_task
  . tasks.mail_invite_member_task.send_invite_member_mail_task
  . tasks.mail_reset_password_task.send_reset_password_mail_task
  . tasks.ops_trace_task.process_trace_tasks
  . tasks.recover_document_indexing_task.recover_document_indexing_task
  . tasks.remove_app_and_related_data_task.remove_app_and_related_data_task
  . tasks.remove_document_from_index_task.remove_document_from_index_task
  . tasks.retry_document_indexing_task.retry_document_indexing_task
  . tasks.sync_website_document_indexing_task.sync_website_document_indexing_task

[2024-08-17 08:41:11,349: INFO/MainProcess] Connected to redis://:**@redis:6379/1
[2024-08-17 08:41:11,353: INFO/MainProcess] mingle: searching for neighbors
[2024-08-17 08:41:12,365: INFO/MainProcess] mingle: all alone
[2024-08-17 08:41:12,385: INFO/MainProcess] celery@2183491c0c0c ready.
[2024-08-17 08:41:12,387: INFO/MainProcess] pidbox: Connected to redis://:**@redis:6379/1.
[2024-08-17 08:41:12,390: INFO/MainProcess] Task tasks.document_indexing_task.document_indexing_task[b51f10a9-f1e6-4460-93a7-5c9ef3236c58] received
[2024-08-17 08:41:12,491: INFO/MainProcess] Start process document: 1673471d-5f27-4d3c-9090-1e8aa5975e63
[FATAL:partition_root.cc(863)] Check failed: (internal::SystemPageSize() == (size_t{1} << 12)) || (internal::SystemPageSize() == (size_t{1} << 14)). 
Running migrations
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
Preparing database migration...
Start database migration.
INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
Database migration successful!
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
/app/api/.venv/lib/python3.10/site-packages/celery/platforms.py:829: SecurityWarning: You're running the worker with superuser privileges: this is
absolutely not recommended!

Please specify a different user using the --uid option.

User information: uid=0 euid=0 gid=0 egid=0

  warnings.warn(SecurityWarning(ROOT_DISCOURAGED.format(

 -------------- celery@2183491c0c0c v5.3.6 (emerald-rush)
--- ***** ----- 
-- ******* ---- Linux-4.19.90-52.39.v2207.ky10.aarch64-aarch64-with-glibc2.39 2024-08-17 08:41:50
- *** --- * --- 
- ** ---------- [config]
- ** ---------- .> app:         app:0xfffba7e9e8f0
- ** ---------- .> transport:   redis://:**@redis:6379/1
- ** ---------- .> results:     postgresql://postgres:**@db:5432/dify
- *** --- * --- .> concurrency: 1 (gevent)
-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)
--- ***** ----- 
 -------------- [queues]
                .> app_deletion     exchange=app_deletion(direct) key=app_deletion
                .> dataset          exchange=dataset(direct) key=dataset
                .> generation       exchange=generation(direct) key=generation
                .> mail             exchange=mail(direct) key=mail
                .> ops_trace        exchange=ops_trace(direct) key=ops_trace

[tasks]
  . schedule.clean_embedding_cache_task.clean_embedding_cache_task
  . schedule.clean_unused_datasets_task.clean_unused_datasets_task
  . tasks.add_document_to_index_task.add_document_to_index_task
  . tasks.annotation.add_annotation_to_index_task.add_annotation_to_index_task
  . tasks.annotation.batch_import_annotations_task.batch_import_annotations_task
  . tasks.annotation.delete_annotation_index_task.delete_annotation_index_task
  . tasks.annotation.disable_annotation_reply_task.disable_annotation_reply_task
  . tasks.annotation.enable_annotation_reply_task.enable_annotation_reply_task
  . tasks.annotation.update_annotation_to_index_task.update_annotation_to_index_task
  . tasks.batch_create_segment_to_index_task.batch_create_segment_to_index_task
  . tasks.clean_dataset_task.clean_dataset_task
  . tasks.clean_document_task.clean_document_task
  . tasks.clean_notion_document_task.clean_notion_document_task
  . tasks.deal_dataset_vector_index_task.deal_dataset_vector_index_task
  . tasks.delete_segment_from_index_task.delete_segment_from_index_task
  . tasks.disable_segment_from_index_task.disable_segment_from_index_task
  . tasks.document_indexing_sync_task.document_indexing_sync_task
  . tasks.document_indexing_task.document_indexing_task
  . tasks.document_indexing_update_task.document_indexing_update_task
  . tasks.duplicate_document_indexing_task.duplicate_document_indexing_task
  . tasks.enable_segment_to_index_task.enable_segment_to_index_task
  . tasks.mail_invite_member_task.send_invite_member_mail_task
  . tasks.mail_reset_password_task.send_reset_password_mail_task
  . tasks.ops_trace_task.process_trace_tasks
  . tasks.recover_document_indexing_task.recover_document_indexing_task
  . tasks.remove_app_and_related_data_task.remove_app_and_related_data_task
  . tasks.remove_document_from_index_task.remove_document_from_index_task
  . tasks.retry_document_indexing_task.retry_document_indexing_task
  . tasks.sync_website_document_indexing_task.sync_website_document_indexing_task

[2024-08-17 08:41:50,656: INFO/MainProcess] Connected to redis://:**@redis:6379/1
[2024-08-17 08:41:50,662: INFO/MainProcess] mingle: searching for neighbors
[2024-08-17 08:41:51,672: INFO/MainProcess] mingle: all alone
[2024-08-17 08:41:51,688: INFO/MainProcess] pidbox: Connected to redis://:**@redis:6379/1.
[2024-08-17 08:41:51,691: INFO/MainProcess] celery@2183491c0c0c ready.
[2024-08-17 08:48:24,830: INFO/MainProcess] Task tasks.clean_dataset_task.clean_dataset_task[f71fcbac-7bdf-4ba8-97f2-be876ba3610b] received
[2024-08-17 08:48:24,831: INFO/MainProcess] Start clean dataset when dataset deleted: c5d697f1-9b67-4154-82d9-23488324c545
[2024-08-17 08:48:24,937: INFO/MainProcess] Cleaning documents for dataset: c5d697f1-9b67-4154-82d9-23488324c545
[2024-08-17 08:48:25,225: INFO/MainProcess] Cleaned dataset when dataset deleted: c5d697f1-9b67-4154-82d9-23488324c545 latency: 0.3934991300338879
[2024-08-17 08:48:25,245: INFO/MainProcess] Task tasks.clean_dataset_task.clean_dataset_task[f71fcbac-7bdf-4ba8-97f2-be876ba3610b] succeeded in 0.4135903869755566s: None
[2024-08-17 08:48:29,859: INFO/MainProcess] Task tasks.clean_dataset_task.clean_dataset_task[d6653f96-1b7e-4232-a5de-68fedfc1d046] received
[2024-08-17 08:48:29,860: INFO/MainProcess] Start clean dataset when dataset deleted: ce6cdb39-8c3c-4371-905e-a20e791067d3
[2024-08-17 08:48:29,870: INFO/MainProcess] Cleaning documents for dataset: ce6cdb39-8c3c-4371-905e-a20e791067d3
[2024-08-17 08:48:30,045: INFO/MainProcess] Cleaned dataset when dataset deleted: ce6cdb39-8c3c-4371-905e-a20e791067d3 latency: 0.18452405591960996
[2024-08-17 08:48:30,065: INFO/MainProcess] Task tasks.clean_dataset_task.clean_dataset_task[d6653f96-1b7e-4232-a5de-68fedfc1d046] succeeded in 0.20479077310301363s: None
crazywoola commented 3 weeks ago

Please provider the container logs of the worker and api.

docker logs -f 2cfa9aae4768

Running migrations
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
INFO:matplotlib.font_manager:generated new fontManager
Preparing database migration...
Start database migration.
INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
Database migration successful!
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
[2024-08-17 08:41:11 +0000] [1] [INFO] Starting gunicorn 22.0.0
[2024-08-17 08:41:11 +0000] [1] [INFO] Listening at: http://0.0.0.0:5001 (1)
[2024-08-17 08:41:11 +0000] [1] [INFO] Using worker: gevent
[2024-08-17 08:41:11 +0000] [158] [INFO] Booting worker with pid: 158
[FATAL:partition_root.cc(863)] Check failed: (internal::SystemPageSize() == (size_t{1} << 12)) || (internal::SystemPageSize() == (size_t{1} << 14)). 
[2024-08-17 08:48:44 +0000] [1] [ERROR] Worker (pid:156) exited with code 1
[2024-08-17 08:48:44 +0000] [1] [ERROR] Worker (pid:156) exited with code 1.
[2024-08-17 08:48:44 +0000] [1] [ERROR] Worker (pid:158) was sent code 133!
[2024-08-17 08:48:44 +0000] [161] [INFO] Booting worker with pid: 161
[FATAL:partition_root.cc(863)] Check failed: (internal::SystemPageSize() == (size_t{1} << 12)) || (internal::SystemPageSize() == (size_t{1} << 14)). 
[2024-08-17 09:05:08 +0000] [1] [ERROR] Worker (pid:161) was sent code 133!
[2024-08-17 09:05:08 +0000] [164] [INFO] Booting worker with pid: 164

### When I importe pdf, this log show up:

[FATAL:partition_root.cc(863)] Check failed: (internal::SystemPageSize() == (size_t{1} << 12)) || (internal::SystemPageSize() == (size_t{1} << 14)). 
[2024-08-19 02:28:31 +0000] [1] [ERROR] Worker (pid:164) was sent code 133!
[2024-08-19 02:28:31 +0000] [166] [INFO] Booting worker with pid: 166
docker logs -f 2183491c0c0c
Running migrations
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
INFO:matplotlib.font_manager:generated new fontManager
Preparing database migration...
Start database migration.
INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
INFO  [alembic.runtime.migration] Running upgrade eeb2e349e6ac -> 1787fbae959a, update tools original_url length
INFO  [alembic.runtime.migration] Running upgrade 1787fbae959a -> 63a83fcf12ba, support conversation variables
Database migration successful!
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
/app/api/.venv/lib/python3.10/site-packages/celery/platforms.py:829: SecurityWarning: You're running the worker with superuser privileges: this is
absolutely not recommended!

Please specify a different user using the --uid option.

User information: uid=0 euid=0 gid=0 egid=0

  warnings.warn(SecurityWarning(ROOT_DISCOURAGED.format(

 -------------- celery@2183491c0c0c v5.3.6 (emerald-rush)
--- ***** ----- 
-- ******* ---- Linux-4.19.90-52.39.v2207.ky10.aarch64-aarch64-with-glibc2.39 2024-08-17 08:41:11
- *** --- * --- 
- ** ---------- [config]
- ** ---------- .> app:         app:0xfffb3cd97250
- ** ---------- .> transport:   redis://:**@redis:6379/1
- ** ---------- .> results:     postgresql://postgres:**@db:5432/dify
- *** --- * --- .> concurrency: 1 (gevent)
-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)
--- ***** ----- 
 -------------- [queues]
                .> app_deletion     exchange=app_deletion(direct) key=app_deletion
                .> dataset          exchange=dataset(direct) key=dataset
                .> generation       exchange=generation(direct) key=generation
                .> mail             exchange=mail(direct) key=mail
                .> ops_trace        exchange=ops_trace(direct) key=ops_trace

[tasks]
  . schedule.clean_embedding_cache_task.clean_embedding_cache_task
  . schedule.clean_unused_datasets_task.clean_unused_datasets_task
  . tasks.add_document_to_index_task.add_document_to_index_task
  . tasks.annotation.add_annotation_to_index_task.add_annotation_to_index_task
  . tasks.annotation.batch_import_annotations_task.batch_import_annotations_task
  . tasks.annotation.delete_annotation_index_task.delete_annotation_index_task
  . tasks.annotation.disable_annotation_reply_task.disable_annotation_reply_task
  . tasks.annotation.enable_annotation_reply_task.enable_annotation_reply_task
  . tasks.annotation.update_annotation_to_index_task.update_annotation_to_index_task
  . tasks.batch_create_segment_to_index_task.batch_create_segment_to_index_task
  . tasks.clean_dataset_task.clean_dataset_task
  . tasks.clean_document_task.clean_document_task
  . tasks.clean_notion_document_task.clean_notion_document_task
  . tasks.deal_dataset_vector_index_task.deal_dataset_vector_index_task
  . tasks.delete_segment_from_index_task.delete_segment_from_index_task
  . tasks.disable_segment_from_index_task.disable_segment_from_index_task
  . tasks.document_indexing_sync_task.document_indexing_sync_task
  . tasks.document_indexing_task.document_indexing_task
  . tasks.document_indexing_update_task.document_indexing_update_task
  . tasks.duplicate_document_indexing_task.duplicate_document_indexing_task
  . tasks.enable_segment_to_index_task.enable_segment_to_index_task
  . tasks.mail_invite_member_task.send_invite_member_mail_task
  . tasks.mail_reset_password_task.send_reset_password_mail_task
  . tasks.ops_trace_task.process_trace_tasks
  . tasks.recover_document_indexing_task.recover_document_indexing_task
  . tasks.remove_app_and_related_data_task.remove_app_and_related_data_task
  . tasks.remove_document_from_index_task.remove_document_from_index_task
  . tasks.retry_document_indexing_task.retry_document_indexing_task
  . tasks.sync_website_document_indexing_task.sync_website_document_indexing_task

[2024-08-17 08:41:11,349: INFO/MainProcess] Connected to redis://:**@redis:6379/1
[2024-08-17 08:41:11,353: INFO/MainProcess] mingle: searching for neighbors
[2024-08-17 08:41:12,365: INFO/MainProcess] mingle: all alone
[2024-08-17 08:41:12,385: INFO/MainProcess] celery@2183491c0c0c ready.
[2024-08-17 08:41:12,387: INFO/MainProcess] pidbox: Connected to redis://:**@redis:6379/1.
[2024-08-17 08:41:12,390: INFO/MainProcess] Task tasks.document_indexing_task.document_indexing_task[b51f10a9-f1e6-4460-93a7-5c9ef3236c58] received
[2024-08-17 08:41:12,491: INFO/MainProcess] Start process document: 1673471d-5f27-4d3c-9090-1e8aa5975e63
[FATAL:partition_root.cc(863)] Check failed: (internal::SystemPageSize() == (size_t{1} << 12)) || (internal::SystemPageSize() == (size_t{1} << 14)). 
Running migrations
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
Preparing database migration...
Start database migration.
INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
Database migration successful!
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
/app/api/.venv/lib/python3.10/site-packages/celery/platforms.py:829: SecurityWarning: You're running the worker with superuser privileges: this is
absolutely not recommended!

Please specify a different user using the --uid option.

User information: uid=0 euid=0 gid=0 egid=0

  warnings.warn(SecurityWarning(ROOT_DISCOURAGED.format(

 -------------- celery@2183491c0c0c v5.3.6 (emerald-rush)
--- ***** ----- 
-- ******* ---- Linux-4.19.90-52.39.v2207.ky10.aarch64-aarch64-with-glibc2.39 2024-08-17 08:41:50
- *** --- * --- 
- ** ---------- [config]
- ** ---------- .> app:         app:0xfffba7e9e8f0
- ** ---------- .> transport:   redis://:**@redis:6379/1
- ** ---------- .> results:     postgresql://postgres:**@db:5432/dify
- *** --- * --- .> concurrency: 1 (gevent)
-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)
--- ***** ----- 
 -------------- [queues]
                .> app_deletion     exchange=app_deletion(direct) key=app_deletion
                .> dataset          exchange=dataset(direct) key=dataset
                .> generation       exchange=generation(direct) key=generation
                .> mail             exchange=mail(direct) key=mail
                .> ops_trace        exchange=ops_trace(direct) key=ops_trace

[tasks]
  . schedule.clean_embedding_cache_task.clean_embedding_cache_task
  . schedule.clean_unused_datasets_task.clean_unused_datasets_task
  . tasks.add_document_to_index_task.add_document_to_index_task
  . tasks.annotation.add_annotation_to_index_task.add_annotation_to_index_task
  . tasks.annotation.batch_import_annotations_task.batch_import_annotations_task
  . tasks.annotation.delete_annotation_index_task.delete_annotation_index_task
  . tasks.annotation.disable_annotation_reply_task.disable_annotation_reply_task
  . tasks.annotation.enable_annotation_reply_task.enable_annotation_reply_task
  . tasks.annotation.update_annotation_to_index_task.update_annotation_to_index_task
  . tasks.batch_create_segment_to_index_task.batch_create_segment_to_index_task
  . tasks.clean_dataset_task.clean_dataset_task
  . tasks.clean_document_task.clean_document_task
  . tasks.clean_notion_document_task.clean_notion_document_task
  . tasks.deal_dataset_vector_index_task.deal_dataset_vector_index_task
  . tasks.delete_segment_from_index_task.delete_segment_from_index_task
  . tasks.disable_segment_from_index_task.disable_segment_from_index_task
  . tasks.document_indexing_sync_task.document_indexing_sync_task
  . tasks.document_indexing_task.document_indexing_task
  . tasks.document_indexing_update_task.document_indexing_update_task
  . tasks.duplicate_document_indexing_task.duplicate_document_indexing_task
  . tasks.enable_segment_to_index_task.enable_segment_to_index_task
  . tasks.mail_invite_member_task.send_invite_member_mail_task
  . tasks.mail_reset_password_task.send_reset_password_mail_task
  . tasks.ops_trace_task.process_trace_tasks
  . tasks.recover_document_indexing_task.recover_document_indexing_task
  . tasks.remove_app_and_related_data_task.remove_app_and_related_data_task
  . tasks.remove_document_from_index_task.remove_document_from_index_task
  . tasks.retry_document_indexing_task.retry_document_indexing_task
  . tasks.sync_website_document_indexing_task.sync_website_document_indexing_task

[2024-08-17 08:41:50,656: INFO/MainProcess] Connected to redis://:**@redis:6379/1
[2024-08-17 08:41:50,662: INFO/MainProcess] mingle: searching for neighbors
[2024-08-17 08:41:51,672: INFO/MainProcess] mingle: all alone
[2024-08-17 08:41:51,688: INFO/MainProcess] pidbox: Connected to redis://:**@redis:6379/1.
[2024-08-17 08:41:51,691: INFO/MainProcess] celery@2183491c0c0c ready.
[2024-08-17 08:48:24,830: INFO/MainProcess] Task tasks.clean_dataset_task.clean_dataset_task[f71fcbac-7bdf-4ba8-97f2-be876ba3610b] received
[2024-08-17 08:48:24,831: INFO/MainProcess] Start clean dataset when dataset deleted: c5d697f1-9b67-4154-82d9-23488324c545
[2024-08-17 08:48:24,937: INFO/MainProcess] Cleaning documents for dataset: c5d697f1-9b67-4154-82d9-23488324c545
[2024-08-17 08:48:25,225: INFO/MainProcess] Cleaned dataset when dataset deleted: c5d697f1-9b67-4154-82d9-23488324c545 latency: 0.3934991300338879
[2024-08-17 08:48:25,245: INFO/MainProcess] Task tasks.clean_dataset_task.clean_dataset_task[f71fcbac-7bdf-4ba8-97f2-be876ba3610b] succeeded in 0.4135903869755566s: None
[2024-08-17 08:48:29,859: INFO/MainProcess] Task tasks.clean_dataset_task.clean_dataset_task[d6653f96-1b7e-4232-a5de-68fedfc1d046] received
[2024-08-17 08:48:29,860: INFO/MainProcess] Start clean dataset when dataset deleted: ce6cdb39-8c3c-4371-905e-a20e791067d3
[2024-08-17 08:48:29,870: INFO/MainProcess] Cleaning documents for dataset: ce6cdb39-8c3c-4371-905e-a20e791067d3
[2024-08-17 08:48:30,045: INFO/MainProcess] Cleaned dataset when dataset deleted: ce6cdb39-8c3c-4371-905e-a20e791067d3 latency: 0.18452405591960996
[2024-08-17 08:48:30,065: INFO/MainProcess] Task tasks.clean_dataset_task.clean_dataset_task[d6653f96-1b7e-4232-a5de-68fedfc1d046] succeeded in 0.20479077310301363s: None

@dosu Generate a fix for this/

dosubot[bot] commented 3 weeks ago

The error message [FATAL:partition_root.cc(863)] Check failed: (internal::SystemPageSize() == (size_t{1} << 12)) || (internal::SystemPageSize() == (size_t{1} << 14)) indicates that the system page size is not what the application expects. This is likely causing the 502 Bad Gateway errors when importing PDFs.

Here are some steps to address this issue:

  1. Check System Page Size: Ensure that the system page size is either 4KB (1 << 12) or 16KB (1 << 14). You can check the page size using the following command:

    getconf PAGE_SIZE
  2. Modify Gunicorn Configuration: Adjust the Gunicorn worker settings to ensure compatibility with your architecture. You can try switching the worker class from gevent to sync or solo in your Gunicorn configuration. Update the SERVER_WORKER_CLASS environment variable accordingly:

    export SERVER_WORKER_CLASS=sync
  3. Update Dependencies: Ensure all dependencies are up-to-date and compatible with aarch64 architecture. Specifically, check for updates to libraries related to PDF processing and Gunicorn.

  4. Rebuild Docker Images: Rebuild your Docker images to ensure all changes and updates are applied:

    docker-compose build
    docker-compose up -d
  5. Debug Mode: Enable debug mode to gather more detailed logs. Set the DEBUG environment variable to true:

    export DEBUG=true
  6. Check for Known Issues: Look for any known issues or patches related to the partition_root.cc error in the libraries you are using.

If the issue persists, you may need to investigate further into the compatibility of specific libraries with the aarch64 architecture.

To continue talking to Dosu, mention @dosu.