jonaswinkler / paperless-ng

A supercharged version of paperless: scan, index and archive all your physical documents
https://paperless-ng.readthedocs.io/en/latest/
GNU General Public License v3.0
5.37k stars 355 forks source link

Please help me test the new multiarch docker images #322

Closed jonaswinkler closed 3 years ago

jonaswinkler commented 3 years ago

I've got the CI pipeline (see #151) pretty much ready, and it has successfully built docker images for amd64, armhf and aarch64.

Image is available at Docker Hub. For anyone interested, the workflow that produced these images is here: https://github.com/jonaswinkler/paperless-ng/actions/runs/476333808.

I don't have aarch64 hardware and would love to hear from people who do if this works. Feedback on the arm/v7 image is also welcome.

These images are based on the latest dev branch, which is identical to the current release + a couple bug fixes. But as with all pre-release things, I wouldn't advise to run that with your actual database.

These images can be used with any of the docker-compose files in the docker/hub/ folder. Just replace the version, and pull.

Things I'd like to see tested:

Thank you!

jonaswinkler commented 3 years ago

One thing I've already spotted with OCRmyPDF

WARNING 2021-01-11 12:49:04,913 tesseract [tesseract] took too long to OCR - skipping

mvdkleijn commented 3 years ago

Nice! :-)

It starts, no obvious problems. Logging in works fine.

My workflow is using the paperless android app mostly. On occasion I get a pdf by email that I add, but not often.

An initial attempt to scan a document using the app results in a Python PIL related error on paperless-ng.

cannot import name '_imagingcms' from 'PIL' (/usr/local/lib/python3.7/site-packages/PIL/__init__.py)

image

jonaswinkler commented 3 years ago

Thank you. Is that the 32bit or 64bit variant of ubuntu?

jonaswinkler commented 3 years ago

See https://github.com/python-pillow/Pillow/issues/5202

mvdkleijn commented 3 years ago

Thank you. Is that the 32bit or 64bit variant of ubuntu?

64-bit

jonaswinkler commented 3 years ago

@mvdkleijn new build is up on the hub, does that resolve the issue?

niarbx commented 3 years ago

Hi Jonas,

I tested the new ARM64 Image on Raspberry Pi 4 4GB on latest Raspbian (now called Rasppery Pi OS) 64 Bit.

I encountered the following:

By the way, I'm using an arm64 image for about a week now in "production". I extended the Dockerfile by another stage to download the sources (so I dont have to checkout the sources every time a new release is ready) and used python:3.9-slim as base image. Works without any errors so far.

Best regards, Tobi

mvdkleijn commented 3 years ago

@jonaswinkler The latest image consumes the document just fine. I do get the samr DPI warning that @niarbx got but other than that it looks fine.

jonaswinkler commented 3 years ago
  • An Image with Text threw the following Warning: WARNING Error while getting DPI from image

That shouldn't be a warning, I'll lower the severity. Some images have DPI information in their metadata, and paperless uses that. That's important for PDF generation (how big should the pages be?). If none is available, paperless will produce A4-sized PDF documents.

  • while consuming a PDF wich consists of a big image the following message was logged

Should be fixed in the image from a couple minutes ago

  • Classifier also works, training and auto-matching worked (while training didnt have accurate results because of too small training data).

Thank you, good to know.

mvdkleijn commented 3 years ago
mannp commented 3 years ago

Testing the docker build with Unraid and consumed a couple of files with no problem.

I threw a selection of scanned pdf's at it (15) and I've lost the gui, not reachable .... no obvious errors in the log, in fact it appears to be still consuming.

Only just found your NG version of paperless today so will take a better look at my config tomorrow to see if it needs tuning.

Cool NG version btw :)

jonaswinkler commented 3 years ago

What platform? It might take up all resources while consuming (this takes a long time on Pi), and the web server might not get enough cpu time to provide a response in time.

Consider the option TASK_WORKERS and THREADS_PER_WORKER (https://paperless-ng.readthedocs.io/en/latest/configuration.html#software-tweaks). Pi3/4 have a quad core, therefore settings WORKERS=2, THREADS=1 will always leave some resources available for other tasks.

See also https://paperless-ng.readthedocs.io/en/latest/setup.html#considerations-for-less-powerful-devices

mannp commented 3 years ago

What platform? It might take up all resources while consuming (this takes a long time on Pi), and the web server might not get enough cpu time to provide a response in time.

Consider the option TASK_WORKERS and THREADS_PER_WORKER (https://paperless-ng.readthedocs.io/en/latest/configuration.html#software-tweaks). Pi3/4 have a quad core, therefore settings WORKERS=2, THREADS=1 will always leave some resources available for other tasks.

See also https://paperless-ng.readthedocs.io/en/latest/setup.html#considerations-for-less-powerful-devices

Thanks for the info, Unraid machine is a Xeon with 32g of memory running multiple dockers ..

jonaswinkler commented 3 years ago

Uhm, yeah. That should not have any issues running this.

sisao commented 3 years ago

It's running on armv7 (Banana Pi M2U)

OS: Armbian (Ubuntu 20.04.1 LTS) Kernel: Linux dms 5.9.14-sunxi #20.11.3 SMP Fri Dec 11 20:31:12 CET 2020 armv7l armv7l armv7l GNU/Linux Docker: 19.03.12 docker-compose: 1.27.4

No errors so far. Consuming Email with attachment works, scanned pdf consuming works, full text search works, training of classifier starts and works.

jonaswinkler commented 3 years ago

Alright, thank you very much. Multi arch images are coming soon.