the-paperless-project / paperless

Scan, index, and archive all of your paper documents
GNU General Public License v3.0
7.84k stars 501 forks source link

optipng just sits there idleing #582

Open SebastianSemper opened 4 years ago

SebastianSemper commented 4 years ago

I encountered the issue that during the optipng step the consumer just idles around without any CPU usage and not progressing in any way. Restarting it makes it go on with the file, but only to come to a rest at a later file.

This is what systemd is telling me:

Alle 2,0s: systemctl status paperless-consumer                                                                  prometheus: Thu Nov 21 18:33:15 2019

● paperless-consumer.service - Paperless consumer
   Loaded: loaded (/etc/systemd/system/paperless-consumer.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2019-11-21 18:26:11 CET; 7min ago
 Main PID: 12455 (python)
    Tasks: 2 (limit: 4915)
   Memory: 160.9M
   CGroup: /system.slice/paperless-consumer.service
           ├─12455 /usr/bin/python /mnt/hdd/paperless/bin/src/manage.py document_consumer
           └─14113 optipng -o5 /tmp/paperless/paperless-gii0wv8n/convert.png -out /tmp/paperless/paperless-gii0wv8n/optipng.png

I am running paperless with a user paperless, which is in the users groupd with the folliwing package versions under Manjaro Linux:

extra/imagemagick 7.0.9.2-3 [Installiert]
    An image viewing/manipulation program
extra/libmagick6 6.9.10.71-1 [Installiert]
    An image viewing/manipulation program (version 6; library)
community/optipng 0.7.7-1 [Installiert]
    Compresses PNG files to a smaller size, without losing any information.
extra/ghostscript 9.50-1 [Installiert]
    An interpreter for the PostScript language
community/pdftricks 0.2.7-1 [Installiert]
    Simple, efficient application for small manipulations in PDF files using Ghostscript

Can anybody confirm this? Help me? What am I missing?

MasterofJOKers commented 4 years ago

Where does /tmp/paperless live? Maybe it's the filesystem being slow? You could you maybe try setting a different temp directory, e.g.

# This will be created if it doesn't exist
PAPERLESS_SCRATCH_DIR="/dev/shm/paperless"
SebastianSemper commented 4 years ago

Do you mean:

# Similar to the memory limit, if you've got a small system and your OS mounts
# /tmp as tmpfs, you should set this to a path that's on a physical disk, like
# /home/your_user/tmp or something.  ImageMagick will use this as scratch space
# when crunching through very large documents.
#
# For more information on how to use this value, you should probably search
# the web for "MAGICK_TMPDIR".
#PAPERLESS_CONVERT_TMPDIR=/var/tmp/paperless

this setting? The other one does not seem to be present. However, trying both settings one after another did not resolve the issue. Moreover, I discovered that my setup does mount /dev/shm to /tmp anyway.

EDIT: I still have no idea why it just hangs and after restarting the consumer it just continues processing the file it was stuck on before restarting.

MasterofJOKers commented 4 years ago

Sorry, yes I probably meant that. My paperless seems a little older :/

You could look into what optipng does while idling with strace -p $pid, if it does any system calls at all. Other than that, I don't know ... did you check standard system stuff like vmstat 1 while it's hanging? Does the CPU really idle or is it used in other processes or wait-time?

SebastianSemper commented 4 years ago

strace delivers the following out, which I am unable to inteprete.

select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=928560}) = 0 (Timeout)
openat(AT_FDCWD, "/mnt/hdd/paperless/scans", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
fstat(4, {st_mode=S_IFDIR|S_ISGID|0777, st_size=228, ...}) = 0
getdents64(4, /* 5 entries */, 32768)   = 240
lstat("/mnt/hdd/paperless/scans/Neues Dokument 2019-11-22 11.35.18.pdf", {st_mode=S_IFREG|0755, st_size=365211, ...}) = 0
lstat("/mnt/hdd/paperless/scans/Neues Dokument 2019-11-22 12.19.57.pdf", {st_mode=S_IFREG|0755, st_size=658268, ...}) = 0
lstat("/mnt/hdd/paperless/scans/Neues Dokument 2019-11-22 13.21.46.pdf", {st_mode=S_IFREG|0755, st_size=1174773, ...}) = 0
getdents64(4, /* 0 entries */, 32768)   = 0
close(4)                                = 0
select(0, NULL, NULL, NULL, {tv_sec=9, tv_usec=999581}) = 0 (Timeout)
openat(AT_FDCWD, "/mnt/hdd/paperless/scans", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
fstat(4, {st_mode=S_IFDIR|S_ISGID|0777, st_size=228, ...}) = 0
getdents64(4, /* 5 entries */, 32768)   = 240
lstat("/mnt/hdd/paperless/scans/Neues Dokument 2019-11-22 11.35.18.pdf", {st_mode=S_IFREG|0755, st_size=365211, ...}) = 0
lstat("/mnt/hdd/paperless/scans/Neues Dokument 2019-11-22 12.19.57.pdf", {st_mode=S_IFREG|0755, st_size=658268, ...}) = 0
lstat("/mnt/hdd/paperless/scans/Neues Dokument 2019-11-22 13.21.46.pdf", {st_mode=S_IFREG|0755, st_size=1174773, ...}) = 0
getdents64(4, /* 0 entries */, 32768)   = 0
close(4)                                = 0
select(0, NULL, NULL, NULL, {tv_sec=9, tv_usec=999609}) = 0 (Timeout)
openat(AT_FDCWD, "/mnt/hdd/paperless/scans", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
fstat(4, {st_mode=S_IFDIR|S_ISGID|0777, st_size=228, ...}) = 0
getdents64(4, /* 5 entries */, 32768)   = 240
lstat("/mnt/hdd/paperless/scans/Neues Dokument 2019-11-22 11.35.18.pdf", {st_mode=S_IFREG|0755, st_size=365211, ...}) = 0
lstat("/mnt/hdd/paperless/scans/Neues Dokument 2019-11-22 12.19.57.pdf", {st_mode=S_IFREG|0755, st_size=658268, ...}) = 0
lstat("/mnt/hdd/paperless/scans/Neues Dokument 2019-11-22 13.21.46.pdf", {st_mode=S_IFREG|0755, st_size=1174773, ...}) = 0
getdents64(4, /* 0 entries */, 32768)   = 0
close(4)                                = 0
select(0, NULL, NULL, NULL, {tv_sec=9, tv_usec=999591}
MasterofJOKers commented 4 years ago

Did you attach the strace to optipng or to paperless' document_consumer? To me it looks rather like the document_consumer scanning the directory for files.

SebastianSemper commented 4 years ago

I attached it to the document_consumer. Now attaching it to optipng I get:

seb@prometheus:~[master]$ sudo strace -p 28505
strace: Process 28505 attached
write(2, "\r", 1)                       = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=1, si_uid=0} ---
+++ killed by SIGTERM +++

which results ins optipng just quitting and paper not continuing anything.

SebastianSemper commented 4 years ago

New insight. If I run the consumer via

sudo -u paperless ./manage.py document_consumer

it runs smoothly without any issues. However the following service unit does just hang on the first document:

[Unit]
Description=Paperless consumer

[Service]
User=paperless
Group=users
ExecStart=/usr/bin/python /mnt/hdd/paperless/bin/src/manage.py document_consumer

[Install]
WantedBy=multi-user.target
MasterofJOKers commented 4 years ago

Just a wild guess, but maybe it needs its outputs closed? Does it create output in the journal? Something like this ...

StandardOutput=null

in [Service].