linuxserver / docker-paperless-ngx

GNU General Public License v3.0
49 stars 7 forks source link

Files (pdf) in the consume folder are not processed automatically #34

Closed queen4me closed 2 years ago

queen4me commented 2 years ago

Expected Behavior

Files in consume folder should be automatically processed by paperless-ngx

Current Behavior

Nothing happens after saving pdf files into to consume folder

Steps to Reproduce

1. 2. 3. 4.

Environment

OS: Ubunti 20.04 CPU architecture: x86_64 How docker service was installed: Using the provided docker-compose.yml with some changes regarding the port and paths.

Command used to create docker container (run/create/compose/screenshot)

docker-compose up -d

Docker logs

github-actions[bot] commented 2 years ago

Thanks for opening your first issue here! Be sure to follow the bug or feature issue templates!

j0nnymoe commented 2 years ago

Marked as invalid as you haven't provided enough information for us to help with this issue.

queen4me commented 2 years ago

Sorry, but what further information do you require? Unfortunately I didn't find any hints in the paperless-ngx documentation.

j0nnymoe commented 2 years ago

As is requested in the template you filled out, your docker-compose and logs.

queen4me commented 2 years ago

Thanks. All right then. Here you have my docker-compose.yml:

`version: "2.1" services: paperless-ngx: image: lscr.io/linuxserver/paperless-ngx:latest container_name: paperless-ngx environment:

11:29:04 [Q] INFO recycled worker Process-1:326 11:29:04 [Q] INFO Process-1:327 ready for work at 1449 11:39:00 [Q] INFO Enqueued 1 11:39:00 [Q] INFO Process-1 created a task from schedule [Check all e-mail accounts] 11:39:00 [Q] INFO Process-1:327 processing [minnesota-blossom-quiet-leopard] 11:39:00 [Q] INFO Process-1:327 stopped doing work 11:39:00 [Q] INFO Processed [minnesota-blossom-quiet-leopard] 11:39:05 [Q] INFO recycled worker Process-1:327 11:39:05 [Q] INFO Process-1:328 ready for work at 1451 11:49:01 [Q] INFO Enqueued 1 11:49:01 [Q] INFO Process-1:328 processing [lactose-summer-april-six] 11:49:01 [Q] INFO Process-1 created a task from schedule [Check all e-mail accounts] 11:49:01 [Q] INFO Process-1:328 stopped doing work 11:49:01 [Q] INFO Processed [lactose-summer-april-six] 11:49:06 [Q] INFO recycled worker Process-1:328 11:49:06 [Q] INFO Process-1:329 ready for work at 1453 11:59:02 [Q] INFO Enqueued 1 11:59:02 [Q] INFO Process-1 created a task from schedule [Check all e-mail accounts] 11:59:02 [Q] INFO Process-1:329 processing [steak-freddie-papa-thirteen] 11:59:02 [Q] INFO Process-1:329 stopped doing work 11:59:02 [Q] INFO Processed [steak-freddie-papa-thirteen] 11:59:07 [Q] INFO recycled worker Process-1:329 11:59:07 [Q] INFO Process-1:330 ready for work at 1455 12:09:03 [Q] INFO Enqueued 1 12:09:03 [Q] INFO Process-1:330 processing [friend-vermont-georgia-salami] 12:09:03 [Q] INFO Process-1 created a task from schedule [Check all e-mail accounts] 12:09:03 [Q] INFO Process-1:330 stopped doing work 12:09:03 [Q] INFO Processed [friend-vermont-georgia-salami] 12:09:08 [Q] INFO recycled worker Process-1:330 12:09:08 [Q] INFO Process-1:331 ready for work at 1457 12:19:04 [Q] INFO Enqueued 1 12:19:04 [Q] INFO Process-1 created a task from schedule [Train the classifier] 12:19:04 [Q] INFO Process-1:331 processing [mars-robert-four-uncle] 12:19:04 [Q] INFO Enqueued 1 12:19:04 [Q] INFO Process-1 created a task from schedule [Check all e-mail accounts] 12:19:04 [Q] INFO Process-1:331 stopped doing work 12:19:04 [Q] INFO Processed [mars-robert-four-uncle] 12:19:09 [Q] INFO recycled worker Process-1:331 12:19:09 [Q] INFO Process-1:332 ready for work at 1460 12:19:09 [Q] INFO Process-1:332 processing [vegan-victor-seven-moon] 12:19:09 [Q] INFO Process-1:332 stopped doing work 12:19:09 [Q] INFO Processed [vegan-victor-seven-moon] 12:19:14 [Q] INFO recycled worker Process-1:332 12:19:14 [Q] INFO Process-1:333 ready for work at 1462 12:29:04 [Q] INFO Enqueued 1 12:29:04 [Q] INFO Process-1 created a task from schedule [Check all e-mail accounts] 12:29:04 [Q] INFO Process-1:333 processing [sodium-papa-march-juliet] 12:29:05 [Q] INFO Process-1:333 stopped doing work 12:29:05 [Q] INFO Processed [sodium-papa-march-juliet] 12:29:09 [Q] INFO recycled worker Process-1:333 12:29:10 [Q] INFO Process-1:334 ready for work at 1464 12:38:35 [Q] INFO Enqueued 1 12:38:35 [Q] INFO Process-1 created a task from schedule [Check all e-mail accounts] 12:38:35 [Q] INFO Process-1:334 processing [nevada-eight-fanta-indigo] 12:38:35 [Q] INFO Process-1:334 stopped doing work 12:38:35 [Q] INFO Processed [nevada-eight-fanta-indigo] 12:38:40 [Q] INFO recycled worker Process-1:334 12:38:40 [Q] INFO Process-1:335 ready for work at 1466 12:48:36 [Q] INFO Enqueued 1 12:48:36 [Q] INFO Process-1 created a task from schedule [Check all e-mail accounts] 12:48:36 [Q] INFO Process-1:335 processing [fix-sierra-arizona-violet] 12:48:36 [Q] INFO Process-1:335 stopped doing work 12:48:36 [Q] INFO Processed [fix-sierra-arizona-violet] 12:48:41 [Q] INFO recycled worker Process-1:335 12:48:41 [Q] INFO Process-1:336 ready for work at 1468 12:58:37 [Q] INFO Enqueued 1 12:58:37 [Q] INFO Process-1 created a task from schedule [Check all e-mail accounts] 12:58:37 [Q] INFO Process-1:336 processing [rugby-cup-leopard-nuts] 12:58:37 [Q] INFO Process-1:336 stopped doing work 12:58:37 [Q] INFO Processed [rugby-cup-leopard-nuts] 12:58:42 [Q] INFO recycled worker Process-1:336 12:58:42 [Q] INFO Process-1:337 ready for work at 1470 13:08:38 [Q] INFO Enqueued 1 13:08:38 [Q] INFO Process-1 created a task from schedule [Check all e-mail accounts] 13:08:38 [Q] INFO Process-1:337 processing [skylark-cat-oklahoma-social] 13:08:38 [Q] INFO Process-1:337 stopped doing work 13:08:38 [Q] INFO Processed [skylark-cat-oklahoma-social] 13:08:43 [Q] INFO recycled worker Process-1:337 13:08:43 [Q] INFO Process-1:338 ready for work at 1497 13:18:39 [Q] INFO Enqueued 1 13:18:39 [Q] INFO Process-1 created a task from schedule [Train the classifier] 13:18:39 [Q] INFO Process-1:338 processing [may-zebra-music-william] 13:18:39 [Q] INFO Enqueued 1 13:18:39 [Q] INFO Process-1 created a task from schedule [Check all e-mail accounts] 13:18:39 [Q] INFO Process-1:338 stopped doing work 13:18:39 [Q] INFO Processed [may-zebra-music-william] 13:18:44 [Q] INFO recycled worker Process-1:338 13:18:44 [Q] INFO Process-1:339 ready for work at 1500 13:18:44 [Q] INFO Process-1:339 processing [potato-spring-fanta-utah] 13:18:44 [Q] INFO Process-1:339 stopped doing work 13:18:44 [Q] INFO Processed [potato-spring-fanta-utah] 13:18:49 [Q] INFO recycled worker Process-1:339 13:18:49 [Q] INFO Process-1:340 ready for work at 1502 13:28:40 [Q] INFO Enqueued 1 13:28:40 [Q] INFO Process-1 created a task from schedule [Check all e-mail accounts] 13:28:40 [Q] INFO Process-1:340 processing [apart-magazine-early-march] 13:28:40 [Q] INFO Process-1:340 stopped doing work 13:28:40 [Q] INFO Processed [apart-magazine-early-march] 13:28:45 [Q] INFO recycled worker Process-1:340 13:28:45 [Q] INFO Process-1:341 ready for work at 1504

queen4me commented 2 years ago

On my NAS which is mounted via mnt/nas/paperless-ngx the following two folders have been created after starting the container: -consume -media

Under media a documents folder was build by the application after uploading three pdf files via webinterface, which have been processed without any problem.

Only the automatic processing of files saved in the consume folder is not working and any help is appreciated.

queen4me commented 2 years ago

I now have a docker-compose.env file with the following but it seems that when apllying the docker-compose.yml the settings in the .env file are ignored:

PAPERLESS_OCR_LANGUAGE=deu+eng PAPERLESS_CONSUMER_POLLING=60

j0nnymoe commented 2 years ago

I suspect the issue might be permissions related that paperless-ngx isn't able to monitor the consume folder as its on a remote host. Could you test on a local file system?

queen4me commented 2 years ago

I'll give it a try. But I don't understand why the OCR-language still is set to eng although defined as deu+eng in the .env file. This file is saved in the same folder as the docker-compose.yml when executed but seems to be completely ignored when starting the container. Setting thes and other variables via the webinterface is not possible?

j0nnymoe commented 2 years ago

Have you defined the .env file within the compose?

queen4me commented 2 years ago

No. Is this described anywhere what exact syntax I have to use in the docker.compose.yml to define the docker-compose.env?

queen4me commented 2 years ago

How do I have to configure the local and dedicated consume folder in the docker-compose.yml file if I want to store the porecessed files on my NAS:

volumes:

j0nnymoe commented 2 years ago

How do I have to configure the local and dedicated consume folder in the docker-compose.yml file if I want to store the porecessed files on my NAS:

volumes:

* /home/john/docker/paperless-ngx:/config

* /mnt/nas/paperless-ngx:/data

I wanted you to just test using your local filesystem for the /data mount so that we can figure out where you issue is.

queen4me commented 2 years ago

I now ran the compose file again with following volumes:

But again nothing is processed when I drop some PDF files into the now local:

/home/john/docker/paperless-ngx/data/consume

j0nnymoe commented 2 years ago

Ok and does the paperless-ngx logs show anything when you drop something into it?

queen4me commented 2 years ago

No, absolutely nothing.

j0nnymoe commented 2 years ago

Ok, will need to test internally.

queen4me commented 2 years ago

Thanks a lot.

queen4me commented 2 years ago

This is the log shown on the webconsole after restarting:

[2022-08-23 15:32:19,261] [INFO] [paperless.management.consumer] Using inotify to watch directory for changes: /data/consume

[2022-08-23 15:32:55,015] [INFO] [paperless.sanity_checker] Sanity checker detected no issues.

queen4me commented 2 years ago

I made it working now just adding the following to the docer-compose.yml:

env_file: docker-compose.env

Is it possible, that the german ocr is missing in the image? I get an error when using:

PAPERLESS_OCR_LANGUAGE=deu

in the docker-compose.env.

How can I enable german ocr?

aptalca commented 2 years ago

https://mods.linuxserver.io/?mod=paperless-ngx

queen4me commented 2 years ago

Yeah thanks. Just found it but with the following lines:

DOCKER_MODS=linuxserver/mods:papermerge-multilangocr OCRLANG=deu

in docker-compose.env german OCR is installed concerning the log but the protocoll in the webinterface confuses me saying:

[2022-08-28 09:35:47,865] [DEBUG] [paperless.parsing.tesseract] Calling OCRmyPDF with args: {'input_file': '/data/consume/scan_310722121453.pdf', 'output_file': '/tmp/paperless/paperless-o74ao9mu/archive.pdf', 'use_threads': True, 'jobs': 2, 'language': 'eng', 'output_type': 'pdfa', 'progress_bar': False, 'skip_text': True, 'clean': True, 'deskew': True, 'rotate_pages': True, 'rotate_pages_threshold': 12.0, 'sidecar': '/tmp/paperless/paperless-o74ao9mu/sidecar.txt'}

Looks like still only eng OCR is used

queen4me commented 2 years ago

I think I got it now with the following lines in docker.compose.env for german and english OCR:

DOCKER_MODS=linuxserver/mods:papermerge-multilangocr OCRLANG=deu PAPERLESS_OCR_LANGUAGE=deu+eng

queen4me commented 2 years ago

Thanks a lot for helping folks!

queen4me commented 2 years ago

OCR in ger+eng is now working as well as processing files in the consume folder.