jonaswinkler / paperless-ng

A supercharged version of paperless: scan, index and archive all your physical documents
https://paperless-ng.readthedocs.io/en/latest/
GNU General Public License v3.0
5.37k stars 355 forks source link

[BUG] Unable to add documents anymore #1359

Open stepanov1975 opened 2 years ago

stepanov1975 commented 2 years ago

Describe the bug All worked fine for a while, but today I am unable to add new documents. Documents in consume directory not consumed and also documents uploaded via GUI not added.

To Reproduce Documents added to the consume directory not consumed Documents uploaded via "Upload new documents" are not added. Stuck at "Upload complete, waiting..."

Screenshots

Webserver logs

[2021-10-02 14:19:07,879] [DEBUG] [paperless.tasks] Training data unchanged.
[2021-10-02 15:31:15,250] [INFO] [paperless.management.consumer] Adding /usr/src/paperless/src/../consume/BWprinter_004634.pdf to the task queue.
[2021-10-02 15:50:00,994] [INFO] [paperless.management.consumer] Adding /usr/src/paperless/src/../consume/BWprinter_004634.pdf to the task queue.
[2021-10-02 15:50:01,003] [INFO] [paperless.management.consumer] Using inotify to watch directory for changes: /usr/src/paperless/src/../consume
[2021-10-02 15:51:55,461] [INFO] [paperless.management.consumer] Adding /usr/src/paperless/src/../consume/BWprinter_004635.pdf to the task queue.
[2021-10-02 16:20:09,628] [DEBUG] [paperless.classifier] Gathering data from database...
[2021-10-02 16:20:10,244] [DEBUG] [paperless.tasks] Training data unchanged.
2021-10-02 12:49:59,570 INFO spawned: 'scheduler' with pid 225
[2021-10-02 15:50:00 +0300] [224] [INFO] Starting gunicorn 20.1.0
[2021-10-02 15:50:00 +0300] [224] [INFO] Listening at: http://0.0.0.0:8000 (224)
[2021-10-02 15:50:00 +0300] [224] [INFO] Using worker: paperless.workers.ConfigurableWorker
[2021-10-02 15:50:00 +0300] [224] [INFO] Server is ready. Spawning workers
[2021-10-02 15:50:00,994] [INFO] [paperless.management.consumer] Adding /usr/src/paperless/src/../consume/BWprinter_004634.pdf to the task queue.
15:50:01 [Q] INFO Enqueued 380
2021-10-02 12:50:00,995 INFO success: consumer entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2021-10-02 12:50:00,996 INFO success: gunicorn entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2021-10-02 12:50:00,997 INFO success: scheduler entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
[2021-10-02 15:50:01,003] [INFO] [paperless.management.consumer] Using inotify to watch directory for changes: /usr/src/paperless/src/../consume
15:50:01 [Q] INFO Q Cluster princess-xray-eleven-coffee starting.
15:50:01 [Q] INFO Process-1 guarding cluster princess-xray-eleven-coffee
15:50:01 [Q] INFO Q Cluster princess-xray-eleven-coffee running.
15:50:01 [Q] INFO Process-1:1 ready for work at 253
15:50:01 [Q] INFO Process-1:2 ready for work at 254
15:50:01 [Q] INFO Process-1:3 monitoring at 255
15:50:01 [Q] INFO Process-1:4 pushing tasks at 256
15:50:01 [Q] INFO Process-1:1 processing [jig-network-diet-single]
15:50:01 [Q] INFO Process-1:2 processing [georgia-black-mars-item]
[2021-10-02 15:51:55,461] [INFO] [paperless.management.consumer] Adding /usr/src/paperless/src/../consume/BWprinter_004635.pdf to the task queue.
15:51:55 [Q] INFO Enqueued 374
15:53:31 [Q] INFO Enqueued 375
15:53:31 [Q] INFO Process-1 created a task from schedule [Check all e-mail accounts]
16:03:33 [Q] INFO Enqueued 376
16:03:33 [Q] INFO Process-1 created a task from schedule [Check all e-mail accounts]
16:13:05 [Q] INFO Enqueued 377
16:13:05 [Q] INFO Process-1 created a task from schedule [Check all e-mail accounts]
16:20:07 [Q] WARNING reincarnated worker Process-1:1 after timeout
16:20:07 [Q] INFO Process-1:5 ready for work at 1192
16:20:07 [Q] INFO Process-1:5 processing [mobile-uranus-sodium-alanine]
16:20:08 [Q] WARNING reincarnated worker Process-1:2 after timeout
16:20:08 [Q] INFO Process-1:6 ready for work at 1200
16:20:08 [Q] INFO Process-1:6 processing [mirror-stream-delaware-tango]
16:20:10 [Q] INFO Process-1:5 stopped doing work
16:20:10 [Q] INFO recycled worker Process-1:5
16:20:10 [Q] INFO Process-1:7 ready for work at 1210
16:20:10 [Q] INFO Process-1:7 processing [fish-neptune-paris-mockingbird]
16:20:11 [Q] INFO Processed [mobile-uranus-sodium-alanine]

Relevant information

# docker-compose file for running paperless from the Docker Hub.
# This file contains everything paperless needs to run.
# Paperless supports amd64, arm and arm64 hardware.
#
# All compose files of paperless configure paperless in the following way:
#
# - Paperless is (re)started on system boot, if it was running before shutdown.
# - Docker volumes for storing data are managed by Docker.
# - Folders for importing and exporting files are created in the same directory
#   as this file and mounted to the correct folders inside the container.
# - Paperless listens on port 8010.
#
# In addition to that, this docker-compose file adds the following optional
# configurations:
#
# - Instead of SQLite (default), PostgreSQL is used as the database server.
#
# To install and update paperless with this file, do the following:
#
# - Open portainer Stacks list and click 'Add stack'
# - Paste the contents of this file and assign a name, e.g. 'Paperless'
# - Click 'Deploy the stack' and wait for it to be deployed
# - Open the list of containers, select paperless_webserver_1
# - Click 'Console' and then 'Connect' to open the command line inside the container
# - Run 'python3 manage.py createsuperuser' to create a user
# - Exit the console
#
# For more extensive installation and update instructions, refer to the
# documentation.

version: "3.4"
services:
  broker:
    image: redis:6.0
    restart: unless-stopped

  db:
    image: postgres:13
    restart: unless-stopped
    volumes:
      - /volume1/paperless_data/pgdata:/var/lib/postgresql/data
    environment:
      POSTGRES_DB: paperless
      POSTGRES_USER: paperless
      POSTGRES_PASSWORD: paperless

  webserver:
    image: jonaswinkler/paperless-ng:latest

    restart: unless-stopped
    depends_on:
      - db
      - broker
    ports:
      - 8008:8000
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000"]
      interval: 30s
      timeout: 10s
      retries: 5
    volumes:
      - /volume1/paperless_data/data:/usr/src/paperless/data
      - /volume1/paperless_data/media:/usr/src/paperless/media
      - /volume1/paperless_data/export:/usr/src/paperless/export
      - /volume1/paperless_data/consume:/usr/src/paperless/consume
    environment:
      PAPERLESS_REDIS: redis://broker:6379
      PAPERLESS_DBHOST: db
# The UID and GID of the user used to run paperless in the container. Set this
# to your UID and GID on the host so that you have write access to the
# consumption directory.
      USERMAP_UID: 1034
      USERMAP_GID: 100
# Additional languages to install for text recognition, separated by a
# whitespace. Note that this is
# different from PAPERLESS_OCR_LANGUAGE (default=eng), which defines the
# language used for OCR.
# The container installs English, German, Italian, Spanish and French by
# default.
# See https://packages.debian.org/search?keywords=tesseract-ocr-&searchon=names&suite=buster
# for available languages.
      PAPERLESS_OCR_LANGUAGES: heb
# Adjust this key if you plan to make paperless available publicly. It should
# be a very long sequence of random characters. You don't need to remember it.
      PAPERLESS_SECRET_KEY: ca72910de95145eb97b3f05ff57bd480ee73457e1085397bd4efe242bd87169ed03b07a22fae052b3e860b120dcc963cc1324aad089de01c03ec378ba6c8727ec92870531cd0173be4b679fbf46547b0
# Use this variable to set a timezone for the Paperless Docker containers. If not specified, defaults to UTC.
      PAPERLESS_TIME_ZONE: Asia/Jerusalem
# The default language to use for OCR. Set this to the language most of your
# documents are written in.
      PAPERLESS_OCR_LANGUAGE: eng+heb

volumes:
  data:
  media:
  pgdata:
ikaruswill commented 2 years ago

+1 I have the same issue, I was just wondering if it was a deployment specific issue since no one else seemed to have the problem.

I'm running paperless on a hybrid arm64/amd64 k3s cluster.

Another observation I had that may be related is that, unless paperless is restarted it does not consume anymore documents.

Also after paperless restarts, it seems like the workers will work through all historical jobs (even completed ones), possibly related to this issue.

00:04:08 [Q] INFO recycled worker Process-1:119
00:04:08 [Q] INFO Process-1:121 ready for work at 436
00:04:08 [Q] INFO Process-1:121 processing [double-lima-island-vermont]
00:04:08 [Q] INFO Process-1:121 stopped doing work
00:04:08 [Q] INFO Processed [double-lima-island-vermont]
00:04:08 [Q] INFO recycled worker Process-1:120
00:04:08 [Q] INFO Process-1:122 ready for work at 438
00:04:08 [Q] INFO Process-1:122 processing [november-princess-magnesium-four]
00:04:08 [Q] INFO Process-1:122 stopped doing work
skorvek commented 2 years ago

I'm seeing the same thing- If I restart the container, all is well. Dead within the day (hours maybe). with no notification I can find that the consumer isn't working. Documents added to the queue- that's the end of the log. If I upload a document via the web interface/dashboard it will be uploaded, but then just sits at "waiting." Not sure how in a docker environment to check for running processes (no top or ps). To add: docker using the latest image (1.5.0), modified to use my existing postgres and redis servers and persistent storage via SMB using portainer. Not a permissions issue- again, restart and all works as expected.

drlobo commented 2 years ago

I have same the issue with version 1.50 and docker installation on synology It looks like it's related the job queue processing After after a few hours, new documents are not processed anymore and a restart of the container restart the job queues. I just restarted the container now (after 2 months) and hundreds of jobs are running (sequentially)

Capture

[...]
16:07:37 [Q] INFO Process-1:794 processing [sixteen-louisiana-bacon-louisiana]
16:07:38 [Q] INFO Processed [king-winner-aspen-romeo]
16:07:42 [Q] INFO Process-1:794 stopped doing work
16:07:42 [Q] INFO Process-1:793 stopped doing work
16:07:42 [Q] INFO recycled worker Process-1:793
16:07:42 [Q] INFO Process-1:795 ready for work at 2429
16:07:42 [Q] INFO Process-1:795 processing [sad-bacon-jig-network]
16:07:42 [Q] INFO recycled worker Process-1:794
16:07:42 [Q] INFO Process-1:796 ready for work at 2430
16:07:42 [Q] INFO Process-1:796 processing [arkansas-quebec-early-skylark]
16:07:43 [Q] INFO Processed [sixteen-louisiana-bacon-louisiana]
16:07:44 [Q] INFO Processed [salami-november-eleven-lemon]
16:07:47 [Q] INFO Process-1:796 stopped doing work
16:07:47 [Q] INFO recycled worker Process-1:796
16:07:47 [Q] INFO Process-1:797 ready for work at 2432
16:07:47 [Q] INFO Process-1:797 processing [tango-pennsylvania-neptune-montana]
16:07:47 [Q] INFO Process-1:795 stopped doing work
16:07:48 [Q] INFO recycled worker Process-1:795
16:07:48 [Q] INFO Process-1:798 ready for work at 2434
16:07:48 [Q] INFO Process-1:798 processing [shade-delta-nebraska-sierra]
16:07:48 [Q] INFO Processed [arkansas-quebec-early-skylark]
16:07:48 [Q] INFO Processed [sad-bacon-jig-network]
[...]
siancu commented 2 years ago

I also am running paperless-ng on a Synology NAS in docker. I've noticed that, when I add documents via the UI, it seems to be stuck in the "Upload complete, waiting ..." state. But for me this happens only at the UI level. If I look at the logs (with docker-compose logs -f), it is processing the documents and, eventually, if I refresh the browser I see them. But the UI stays in that bad state.

nico89-exp commented 2 years ago

I can confirm this for Raspi 4 installation. Consuming doesn't start again after a while. I scheduled a cron job to restart paperless container every hour until this is fixed .