Open noonesaid opened 2 years ago
I am not 100% sure this is always like that, but I also have no explicit task listed to check the consumer directory. It scans by default but might have troubles depending on the filesystem.
So it is either done automatic using inotify or set up manually.
But automatic mode did not work reliably for me, I decided to switch on polling every 60 seconds by adding
PAPERLESS_CONSUMER_POLLING: 60
to the environment variables in docker.
See here:
https://paperless-ng.readthedocs.io/en/latest/configuration.html#configuration-polling
Thank you woessmich, I have added that environment variable and redeployed the stack (I'm using portainer) but it still does not automatically consume the files in the folder. I don't even know how to get it to manually consume what's in the folder. I can only drag and drop files to upload it to paperless-ng.
I had a similar scenario with the original paperless. If I remember correctly, for the consume directory paperless relies on inotify to recognize file changes.
SMB/Cifs, as most other network filesystems do not create those events correctly or not at all.
@noonesaid This is how my docker-compose.yml looks like. I am using paperless-ng in Docker on a Synology using Portainer to deplay the stack. Note: I also use custom file naming and Gotenberg and Tika for the Office documents, but that is not required.
# docker-compose file for running paperless from the Docker Hub.
# This file contains everything paperless needs to run.
# Paperless supports amd64, arm and arm64 hardware.
#
# All compose files of paperless configure paperless in the following way:
#
#- Paperless is (re)started on system boot, if it was running before shutdown.
#- Docker volumes for storing data are managed by Docker.
#- Folders for importing and exporting files are created in the same directory
# as this file and mounted to the correct folders inside the container.
# - Paperless listens on port 8010.
#
# In addition to that, this docker-compose file adds the following optional
# configurations:
#
# - Instead of SQLite (default), PostgreSQL is used as the database server.
#
# To install and update paperless with this file, do the following:
#
# - Open portainer Stacks list and click 'Add stack'
# - Paste the contents of this file and assign a name, e.g. 'Paperless'
# - Click 'Deploy the stack' and wait for it to be deployed
# - Open the list of containers, select paperless_webserver_1
# - Click 'Console' and then 'Connect' to open the command line inside the container
# - Run 'python3 manage.py createsuperuser' to create a user
# - Exit the console
#
# For more extensive installation and update instructions, refer to the
# documentation.
version: "3.4"
services:
broker:
image: redis:6.0
restart: unless-stopped
db:
image: postgres:13
restart: unless-stopped
volumes:
- /volume1/docker/paperless-ng/pgdata:/var/lib/postgresql/data
environment:
POSTGRES_DB: paperless
POSTGRES_USER: paperless
POSTGRES_PASSWORD: paperless
webserver:
image: jonaswinkler/paperless-ng:latest
restart: unless-stopped
depends_on:
- db
- broker
ports:
- 8010:8000
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000"]
interval: 30s
timeout: 10s
retries: 5
volumes:
- /volume1/docker/paperless-ng/data:/usr/src/paperless/data
- /volume1/docker/paperless-ng/media:/usr/src/paperless/media
- /volume1/docker/paperless-ng/export:/usr/src/paperless/export
- /volume1/scratch/INCOMING:/usr/src/paperless/consume
environment:
PAPERLESS_REDIS: redis://broker:6379
PAPERLESS_DBHOST: db
# The UID and GID of the user used to run paperless in the container. Set this
# to your UID and GID on the host so that you have write access to the
# consumption directory.
USERMAP_UID: 1000
USERMAP_GID: 100
# Additional languages to install for text recognition, separated by a
# whitespace. Note that this is
# different from PAPERLESS_OCR_LANGUAGE (default=eng), which defines the
# language used for OCR.
# The container installs English, German, Italian, Spanish and French by
# default.
# See https://packages.debian.org/search?keywords=tesseract-ocr-&searchon=names&suite=buster
# for available languages.
#PAPERLESS_OCR_LANGUAGES: tur ces
# Adjust this key if you plan to make paperless available publicly. It should
# be a very long sequence of random characters. You don't need to remember it.
#PAPERLESS_SECRET_KEY: change-me
# Use this variable to set a timezone for the Paperless Docker containers. If not specified, defaults to UTC.
PAPERLESS_TIME_ZONE: Europe/Berlin
# The default language to use for OCR. Set this to the language most of your
# documents are written in.
PAPERLESS_OCR_LANGUAGES: "eng deu"
PAPERLESS_OCR_LANGUAGE: "deu" # most documents have this language
PAPERLESS_TIKA_ENABLED: 1
PAPERLESS_TIKA_GOTENBERG_ENDPOINT: http://gotenberg:3000
PAPERLESS_TIKA_ENDPOINT: http://tika:9998
PAPERLESS_FILENAME_FORMAT: "{created_year}/{correspondent}/{document_type}_{title}_{created}"
PAPERLESS_CONSUMER_POLLING: 60
PAPERLESS_CONSUMER_DELETE_DUPLICATES: 1
PAPERLESS_CONSUMER_RECURSIVE: 1
PAPERLESS_CONSUMER_SUBDIRS_AS_TAGS: 1
PAPERLESS_OCR_MODE: skip
PAPERLESS_CONSUMER_IGNORE_PATTERNS: '[".DS_STORE/*", "._*", ".stfolder/*","@eaDir/*"]'
gotenberg:
image: thecodingmachine/gotenberg:6
restart: unless-stopped
environment:
DISABLE_GOOGLE_CHROME: 1
DEFAULT_WAIT_TIMEOUT: 30
tika:
image: apache/tika:1.27
restart: unless-stopped
volumes:
data:
media:
pgdata:
I had a similar scenario with the original paperless. If I remember correctly, for the consume directory paperless relies on inotify to recognize file changes.
SMB/Cifs, as most other network filesystems do not create those events correctly or not at all.
Oh I see! Very interesting... I will try to change to NFS. I actually tried NFS first but couldn't get the permissions correct for some reason and paperless couldn't get write access. I'll play around with it again.
I will try to change to NFS
@noonesaid Did you get a chance to try NFS? I am experiencing the same issue and was wondering if switching to NFS for the consume directory worked before giving this a try.
The listed compose file includes "PAPERLESS_CONSUMER_POLLING: 60" which means that paperless is not using inotify to schedule consumption, but rather polling the directory for changes. My system is set up on CIFS with polling as inotify would cause multiple consumption failures as it kept trying to grab partially-uploaded files.
I just installed paperless-ng using portainer on my Ubuntu server. I have a SMB share mounted to a folder that is set to the paperless'ng consume directory. The SMB folder is where my scanner automatically sends scanned files to.
All the previous PDFs in the consume directory have been scanned. I went to scan in more files but noticed they were never consumed by paperless-ng. I checked the admin panel and there are only 4 tasks: check all email accounts, train the classifier, optimize the index, and perform sanity check.
When I added documents.tasks.consume_file (I didn't change any other paremeters in this new task besides the function) I get this error:
Does anyone know how to solve this? And is paperless-ng supposed to be scanning the folder automatically with the default settings?