Closed jonaswinkler closed 3 years ago
One thing I've already spotted with OCRmyPDF
WARNING 2021-01-11 12:49:04,913 tesseract [tesseract] took too long to OCR - skipping
Nice! :-)
It starts, no obvious problems. Logging in works fine.
My workflow is using the paperless android app mostly. On occasion I get a pdf by email that I add, but not often.
An initial attempt to scan a document using the app results in a Python PIL related error on paperless-ng.
cannot import name '_imagingcms' from 'PIL' (/usr/local/lib/python3.7/site-packages/PIL/__init__.py)
Thank you. Is that the 32bit or 64bit variant of ubuntu?
Thank you. Is that the 32bit or 64bit variant of ubuntu?
64-bit
@mvdkleijn new build is up on the hub, does that resolve the issue?
Hi Jonas,
I tested the new ARM64 Image on Raspberry Pi 4 4GB on latest Raspbian (now called Rasppery Pi OS) 64 Bit.
I encountered the following:
ERROR Error while consuming document PDFWithImage.pdf: cannot import name '_imagingcms' from 'PIL' (/usr/local/lib/python3.7/site-packages/PIL/__init__.py)
WARNING Error while getting DPI from image
By the way, I'm using an arm64 image for about a week now in "production". I extended the Dockerfile by another stage to download the sources (so I dont have to checkout the sources every time a new release is ready) and used python:3.9-slim as base image. Works without any errors so far.
Best regards, Tobi
@jonaswinkler The latest image consumes the document just fine. I do get the samr DPI warning that @niarbx got but other than that it looks fine.
- An Image with Text threw the following Warning:
WARNING Error while getting DPI from image
That shouldn't be a warning, I'll lower the severity. Some images have DPI information in their metadata, and paperless uses that. That's important for PDF generation (how big should the pages be?). If none is available, paperless will produce A4-sized PDF documents.
- while consuming a PDF wich consists of a big image the following message was logged
Should be fixed in the image from a couple minutes ago
- Classifier also works, training and auto-matching worked (while training didnt have accurate results because of too small training data).
Thank you, good to know.
Testing the docker build with Unraid and consumed a couple of files with no problem.
I threw a selection of scanned pdf's at it (15) and I've lost the gui, not reachable .... no obvious errors in the log, in fact it appears to be still consuming.
Only just found your NG version of paperless today so will take a better look at my config tomorrow to see if it needs tuning.
Cool NG version btw :)
What platform? It might take up all resources while consuming (this takes a long time on Pi), and the web server might not get enough cpu time to provide a response in time.
Consider the option TASK_WORKERS and THREADS_PER_WORKER (https://paperless-ng.readthedocs.io/en/latest/configuration.html#software-tweaks). Pi3/4 have a quad core, therefore settings WORKERS=2, THREADS=1 will always leave some resources available for other tasks.
See also https://paperless-ng.readthedocs.io/en/latest/setup.html#considerations-for-less-powerful-devices
What platform? It might take up all resources while consuming (this takes a long time on Pi), and the web server might not get enough cpu time to provide a response in time.
Consider the option TASK_WORKERS and THREADS_PER_WORKER (https://paperless-ng.readthedocs.io/en/latest/configuration.html#software-tweaks). Pi3/4 have a quad core, therefore settings WORKERS=2, THREADS=1 will always leave some resources available for other tasks.
See also https://paperless-ng.readthedocs.io/en/latest/setup.html#considerations-for-less-powerful-devices
Thanks for the info, Unraid machine is a Xeon with 32g of memory running multiple dockers ..
Uhm, yeah. That should not have any issues running this.
It's running on armv7 (Banana Pi M2U)
OS: Armbian (Ubuntu 20.04.1 LTS) Kernel: Linux dms 5.9.14-sunxi #20.11.3 SMP Fri Dec 11 20:31:12 CET 2020 armv7l armv7l armv7l GNU/Linux Docker: 19.03.12 docker-compose: 1.27.4
No errors so far. Consuming Email with attachment works, scanned pdf consuming works, full text search works, training of classifier starts and works.
Alright, thank you very much. Multi arch images are coming soon.
I've got the CI pipeline (see #151) pretty much ready, and it has successfully built docker images for amd64, armhf and aarch64.
Image is available at Docker Hub. For anyone interested, the workflow that produced these images is here: https://github.com/jonaswinkler/paperless-ng/actions/runs/476333808.
I don't have aarch64 hardware and would love to hear from people who do if this works. Feedback on the arm/v7 image is also welcome.
These images are based on the latest
dev
branch, which is identical to the current release + a couple bug fixes. But as with all pre-release things, I wouldn't advise to run that with your actual database.These images can be used with any of the docker-compose files in the docker/hub/ folder. Just replace the version, and pull.
Things I'd like to see tested:
Consume digital PDF documents with embedded text
Consume scanned PDF documents without embedded text
Consume JPG documents
Add some "Auto" matching metadata to documents and inspect whether the "Train the classifier" scheduled task executes successfully. You can schedule that to run immediately by going into the admin, editing that scheduled task, and clicking Today / Now further down, then save. To make sure that it's working, you should evenutally see something like this in the logs with the filter set to DEBUG:
Try to find some documents with the full text search.
Thank you!