the-paperless-project / paperless

Scan, index, and archive all of your paper documents
GNU General Public License v3.0
7.85k stars 498 forks source link

Consumer uses 100% CPU when idle #700

Closed shtrom closed 4 years ago

shtrom commented 4 years ago

I'm testing Paperless, and using the Docker method on Linux. It all works reasonably fine, except that the consumer container sits consuming 100% of one CPU core all the time, even when not processing documents.

I suspected the polling period at first, but I guess it should be using Inotify, so that is unlikely to be it.

At this stage I'm essentially after suggestions about how to debug it further:b

Sblop commented 4 years ago

check if you have a document in your consumer folder, that paperless cant consume.

shtrom commented 4 years ago

Yup, that seems to have been the case. Looking at the logs, it kept trying to process the same file, and fail: PARSE FAILURE for /consume/doc.jpg: Language detection failed. Set PAPERLESS_FORGIVING_OCR in config file to continue anyway.

What I'm uncertain about is that, seeing the suggestion in the log message, I set PAPERLESS_FORGIVING_OCR=true in the env_file (docker-compose.yml). It does shows up when I check the env in the consumer container

$ docker-compose run consumer /usr/bin/env                                                                         7s 126 ↵ master
Starting paperless_webserver_1 ... done
HOSTNAME=c1133e84f7ad
PAPERLESS_CONSUMPTION_DIR=/consume
PWD=/usr/src/paperless/src
TZ=Australia/Hobart
HOME=/root
PAPERLESS_FORGIVING_OCR=true
PAPERLESS_EXPORT_DIR=/export
TERM=xterm
SHLVL=0
PAPERLESS_OCR_LANGUAGES=eng fre
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PAPERLESS_DISABLE_LOGIN=true

but the consumer continues to choke on it, and suggesting the same fix.

Note that I also have PAPERLESS_DISABLE_LOGIN=true, which also seems to be ignored by the webserver, so maybe I'm not passing the env properly through the docker-compose.env:

$ grep ^PAPERLESS docker-compose.env                                                                                    0s  master
PAPERLESS_DISABLE_LOGIN=true
PAPERLESS_FORGIVING_OCR=true
PAPERLESS_OCR_LANGUAGES=eng fre

Am I setting those properly?

shtrom commented 4 years ago

Yeah, there were a few pathological documents. When I got them out of the way, Paperless finished processing everything else, and the consumer went back to 0%.

Still uncertain about how to pass configuration options through the env, but that's a different issue.

Thanks!