Closed l4rm4nd closed 2 years ago
I'm experiencing the same issue on Ubuntu 20.04.3 (aarch64)
I am also having the same issue. Using this image on a raspberry pi 4B (4GB) with default raspberry pi os (32bit).
same on raspberry 4b 8GB with Ubuntu 21.04 64bit.
I had the same issue on Ubuntu 20.04.1 aarch64 And I fixed it with connecting to the container and running this command
apt update && apt install python3-dev build-essential -y && pip install pikepdf==2.16.1 --force-reinstall
works also with newer version of pikepdf but I'm trying to be on the same version as provided.
Edit: verison of paperless-ng version-ng-1.5.0
@Darlekesh yup, that worked. thank you!
Might want to leave this issue open until it's fixed in the actual image
@Darlekesh Unfortunately that did not work for me on Raspberry Pi 4B (4GB) with default Raspberry Pi OS (32bit).
Since it has been taking so much time to resolve this issue, I reverted to the paperless-ng docker image offered by the official project. They luckily have an image for the rpi.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This is still an issue. Building pikepdf wheel does not work on Docker for Mac (M1, aarch64):
Collecting pikepdf==2.16.1
Downloading pikepdf-2.16.1.tar.gz (2.3 MB)
|████████████████████████████████| 2.3 MB 1.7 MB/s
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... done
Collecting Pillow>=6.0
Downloading Pillow-8.4.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (3.0 MB)
|████████████████████████████████| 3.0 MB 37.0 MB/s
Collecting lxml>=4.0
Downloading lxml-4.6.4-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.manylinux_2_24_aarch64.whl (6.5 MB)
|████████████████████████████████| 6.5 MB 6.7 MB/s
Building wheels for collected packages: pikepdf
Building wheel for pikepdf (pyproject.toml) ... error
ERROR: Command errored out with exit status 1:
command: /usr/bin/python3 /usr/local/lib/python3.8/dist-packages/pip/_vendor/pep517/in_process/_in_process.py build_wheel /tmp/tmpfayft38f
cwd: /tmp/pip-install-0zsq3vrr/pikepdf_1d6b213d99e149d1a2c51e6a7d22ff7a
Complete output (46 lines):
running bdist_wheel
running build
running build_py
creating build
creating build/lib.linux-aarch64-3.8
creating build/lib.linux-aarch64-3.8/pikepdf
copying src/pikepdf/_cpphelpers.py -> build/lib.linux-aarch64-3.8/pikepdf
copying src/pikepdf/_version.py -> build/lib.linux-aarch64-3.8/pikepdf
copying src/pikepdf/jbig2.py -> build/lib.linux-aarch64-3.8/pikepdf
copying src/pikepdf/objects.py -> build/lib.linux-aarch64-3.8/pikepdf
copying src/pikepdf/codec.py -> build/lib.linux-aarch64-3.8/pikepdf
copying src/pikepdf/_methods.py -> build/lib.linux-aarch64-3.8/pikepdf
copying src/pikepdf/_xml.py -> build/lib.linux-aarch64-3.8/pikepdf
copying src/pikepdf/__init__.py -> build/lib.linux-aarch64-3.8/pikepdf
creating build/lib.linux-aarch64-3.8/pikepdf/models
copying src/pikepdf/models/encryption.py -> build/lib.linux-aarch64-3.8/pikepdf/models
copying src/pikepdf/models/outlines.py -> build/lib.linux-aarch64-3.8/pikepdf/models
copying src/pikepdf/models/metadata.py -> build/lib.linux-aarch64-3.8/pikepdf/models
copying src/pikepdf/models/image.py -> build/lib.linux-aarch64-3.8/pikepdf/models
copying src/pikepdf/models/matrix.py -> build/lib.linux-aarch64-3.8/pikepdf/models
copying src/pikepdf/models/__init__.py -> build/lib.linux-aarch64-3.8/pikepdf/models
running egg_info
writing src/pikepdf.egg-info/PKG-INFO
writing dependency_links to src/pikepdf.egg-info/dependency_links.txt
writing requirements to src/pikepdf.egg-info/requires.txt
writing top-level names to src/pikepdf.egg-info/top_level.txt
reading manifest file 'src/pikepdf.egg-info/SOURCES.txt'
adding license file 'LICENSE.txt'
adding license file 'licenses/license.wheel.txt'
writing manifest file 'src/pikepdf.egg-info/SOURCES.txt'
copying src/pikepdf/_qpdf.pyi -> build/lib.linux-aarch64-3.8/pikepdf
copying src/pikepdf/py.typed -> build/lib.linux-aarch64-3.8/pikepdf
running build_ext
building 'pikepdf._qpdf' extension
creating build/temp.linux-aarch64-3.8
creating build/temp.linux-aarch64-3.8/src
creating build/temp.linux-aarch64-3.8/src/qpdf
aarch64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/tmp/pip-build-env-1ierem4r/overlay/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c src/qpdf/object.cpp -o build/temp.linux-aarch64-3.8/src/qpdf/object.o -fvisibility=hidden -g0 -std=c++14
aarch64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/tmp/pip-build-env-1ierem4r/overlay/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c src/qpdf/annotation.cpp -o build/temp.linux-aarch64-3.8/src/qpdf/annotation.o -fvisibility=hidden -g0 -std=c++14
aarch64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/tmp/pip-build-env-1ierem4r/overlay/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c src/qpdf/object_convert.cpp -o build/temp.linux-aarch64-3.8/src/qpdf/object_convert.o -fvisibility=hidden -g0 -std=c++14
aarch64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/tmp/pip-build-env-1ierem4r/overlay/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c src/qpdf/object_repr.cpp -o build/temp.linux-aarch64-3.8/src/qpdf/object_repr.o -fvisibility=hidden -g0 -std=c++14
aarch64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/tmp/pip-build-env-1ierem4r/overlay/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c src/qpdf/page.cpp -o build/temp.linux-aarch64-3.8/src/qpdf/page.o -fvisibility=hidden -g0 -std=c++14
aarch64-linux-gnu-gcc: fatal error: Killed signal terminated program cc1plus
compilation terminated.
aarch64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/tmp/pip-build-env-1ierem4r/overlay/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c src/qpdf/parsers.cpp -o build/temp.linux-aarch64-3.8/src/qpdf/parsers.o -fvisibility=hidden -g0 -std=c++14
error: command 'aarch64-linux-gnu-gcc' failed with exit status 1
----------------------------------------
ERROR: Failed building wheel for pikepdf
Failed to build pikepdf
ERROR: Could not build wheels for pikepdf, which is required to install pyproject.toml-based projects
Same error here still
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This is still an issue, tried installing last night on Raspberry Pi 4 4gb.
Also seeing the same issue. It looks like a task fails due to memory issue on first boot. Just relaunch again fresh, and it looks like it didn't crash on it first boot. (I removed all storage before restarting)
07:21:01 [Q] INFO Process-1:8 ready for work at 428 [2021-12-25 07:22:27 +0000] [364] [CRITICAL] WORKER TIMEOUT (pid:394) [2021-12-25 07:22:27 +0000] [364] [WARNING] Worker with pid 394 was terminated due to signal 6 07:23:53 [Q] INFO Enqueued 1
Attempting to import an .pdf and getting errors.
07:24:01 [Q] INFO Process-1:7 stopped doing work
07:24:01 [Q] ERROR Failed [2020_TaxReturn.pdf] - pikepdf's extension library failed to import : Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/pikepdf/init.py", line 13, in
getting issue on arm64 as of today
2021-12-26T17:50:20.036147514Z ImportError: pikepdf's extension library failed to import
2021-12-26T17:50:20.036145274Z raise ImportError(_msg) from _e
2021-12-26T17:50:20.036149874Z
2021-12-26T17:50:20.023669087Z 17:50:20 [Q] INFO Process-1:12 stopped doing work
2021-12-26T17:50:20.036142674Z File "/usr/local/lib/python3.8/dist-packages/pikepdf/__init__.py", line 16, in <module>
2021-12-26T17:50:20.036137714Z File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/helpers.py", line 22, in <module>
2021-12-26T17:50:20.036140474Z import pikepdf
2021-12-26T17:50:20.036135234Z from ocrmypdf import helpers, hocrtransform, leptonica, pdfa, pdfinfo
2021-12-26T17:50:20.036132634Z File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/__init__.py", line 10, in <module>
2021-12-26T17:50:20.036127834Z File "/app/paperless/src/paperless_tesseract/parsers.py", line 230, in parse
2021-12-26T17:50:20.036130314Z import ocrmypdf
2021-12-26T17:50:20.036125394Z document_parser.parse(self.path, mime_type, self.filename)
2021-12-26T17:50:20.036122834Z File "/app/paperless/src/documents/consumer.py", line 248, in try_consume_file
2021-12-26T17:50:20.036120474Z document = Consumer().try_consume_file(
2021-12-26T17:50:20.036110554Z Traceback (most recent call last):
2021-12-26T17:50:20.036112954Z File "/usr/local/lib/python3.8/dist-packages/django_q/cluster.py", line 432, in worker
2021-12-26T17:50:20.036115514Z res = f(*task["args"], **task["kwargs"])
2021-12-26T17:50:20.036117994Z File "/app/paperless/src/documents/tasks.py", line 74, in consume_file
Same issue here :
18:05:05 [Q] INFO recycled worker Process-1:1
18:05:05 [Q] INFO Process-1:5 ready for work at 453
[2021-12-26 18:06:56,329] [INFO] [paperless.management.consumer] Adding /data/consume/201905.pdf to the task queue.
18:06:56 [Q] INFO Enqueued 1
18:06:56 [Q] INFO Process-1:2 processing [201905.pdf]
[2021-12-26 18:06:56,630] [INFO] [paperless.consumer] Consuming 201905.pdf
18:06:57 [Q] INFO Process-1:2 stopped doing work
18:06:57 [Q] ERROR Failed [201905.pdf] - pikepdf's extension library failed to import : Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/pikepdf/__init__.py", line 13, in <module>
from . import _qpdf
ImportError: /usr/local/lib/python3.8/dist-packages/pikepdf/_qpdf.cpython-38-arm-linux-gnueabihf.so: undefined symbol: _ZN20QPDFPageObjectHelper16placeFormXObjectE16QPDFObjectHandleRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS0_9RectangleEbbb
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/django_q/cluster.py", line 432, in worker
res = f(*task["args"], **task["kwargs"])
File "/app/paperless/src/documents/tasks.py", line 74, in consume_file
document = Consumer().try_consume_file(
File "/app/paperless/src/documents/consumer.py", line 248, in try_consume_file
document_parser.parse(self.path, mime_type, self.filename)
File "/app/paperless/src/paperless_tesseract/parsers.py", line 230, in parse
import ocrmypdf
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/__init__.py", line 10, in <module>
from ocrmypdf import helpers, hocrtransform, leptonica, pdfa, pdfinfo
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/helpers.py", line 22, in <module>
import pikepdf
File "/usr/local/lib/python3.8/dist-packages/pikepdf/__init__.py", line 16, in <module>
raise ImportError(_msg) from _e
ImportError: pikepdf's extension library failed to import
I had this same issue, and after some testing and with the help of previous comments here, the following commands worked for me on rpi 4GB sudo apt-get install libxml2-dev libxslt-dev python-dev sudo apt-get install libjpeg-dev zlib1g-dev pip install wheel apt install python3-dev build-essential -y && pip install pikepdf==2.16.1 --force-reinstall
As @vemek stated, it is not building on Apple Silicon (M1).
@vemek @darkmattercoder
Could you try the following:
apt-get install libxml2-dev libxslt-dev python3-dev libjpeg-dev zlib1g-dev build-essential cython3 -y
python3 -m pip install wheel
python3 -m pip install lxml --force-reinstall
python3 -m pip install Pillow --force-reinstall
python3 -m pip install pikepdf --force-reinstall
I was getting similar issues on my raspberry pi 3. The above steps fixed it for me.
I just tried to import another pdf scanned from switft scanner iOS app and got an error I cant solve on mu RPI4
[2022-01-21 20:42:26,286] [ERROR] [paperless.consumer] Error while consuming document Consulta otorrino António.pdf: ImportError: cannot import name '_imagingcms' from 'PIL' (/usr/local/lib/python3.8/dist-packages/PIL/init.py)
Traceback (most recent call last):
File "/app/paperless/src/paperless_tesseract/parsers.py", line 241, in parse
ocrmypdf.ocr(**args)
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/api.py", line 340, in ocr
return run_pipeline(options=options, plugin_manager=plugin_manager, api=True)
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/_sync.py", line 359, in run_pipeline
pdfinfo = get_pdfinfo(
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/_pipeline.py", line 157, in get_pdfinfo
return PdfInfo(
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/pdfinfo/info.py", line 860, in init
self._pages = _pdf_pageinfo_concurrent(
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/pdfinfo/info.py", line 644, in _pdf_pageinfo_concurrent
executor(
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/_concurrent.py", line 82, in call
self._execute(
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/builtin_plugins/concurrency.py", line 132, in _execute
for result in results:
File "/usr/lib/python3.8/multiprocessing/pool.py", line 868, in next
raise value
File "/usr/lib/python3.8/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/pdfinfo/info.py", line 601, in _pdf_pageinfo_sync
page = PageInfo(pdf, pageno, infile, check_pages, detailed_analysis)
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/pdfinfo/info.py", line 675, in init
self._gather_pageinfo(pdf, pageno, infile, check_pages, detailed_analysis)
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/pdfinfo/info.py", line 721, in _gather_pageinfo
for ci in _process_content_streams(
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/pdfinfo/info.py", line 528, in _process_content_streams
yield from _find_regular_images(container, contentsinfo)
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/pdfinfo/info.py", line 446, in _find_regular_images
yield ImageInfo(name=draw.name, pdfimage=pdfimage, shorthand=draw.shorthand)
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/pdfinfo/info.py", line 319, in init
pim_icc = pim.icc
File "/usr/local/lib/python3.8/dist-packages/pikepdf/models/image.py", line 480, in icc
self._icc = ImageCms.ImageCmsProfile(iccbytesio)
File "/usr/local/lib/python3.8/dist-packages/PIL/ImageCms.py", line 172, in init
self._set(core.profile_frombytes(profile.read()))
File "/usr/local/lib/python3.8/dist-packages/PIL/_util.py", line 19, in getattr
raise self.ex
File "/usr/local/lib/python3.8/dist-packages/PIL/ImageCms.py", line 23, in
from PIL import _imagingcms
ImportError: cannot import name '_imagingcms' from 'PIL' (/usr/local/lib/python3.8/dist-packages/PIL/init.py)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/app/paperless/src/documents/consumer.py", line 248, in try_consume_file
document_parser.parse(self.path, mime_type, self.filename)
File "/app/paperless/src/paperless_tesseract/parsers.py", line 290, in parse
raise ParseError(f"{e.__class__.__name__}: {str(e)}")
documents.parsers.ParseError: ImportError: cannot import name '_imagingcms' from 'PIL' (/usr/local/lib/python3.8/dist-packages/PIL/init.py)
I am facing this issue.
@kurosch the commands need to be run inside paperless container? Please confirm.
I had this same issue, and after some testing and with the help of previous comments here, the following commands worked for me on rpi 4GB
sudo apt-get install libxml2-dev libxslt-dev python-dev sudo apt-get install libjpeg-dev zlib1g-dev pip install wheel apt install python3-dev build-essential -y && pip install pikepdf==2.16.1 --force-reinstall
Confirming this worked for me on RPi 4B with Raspberry OS 32b
I'm on an RPI4 as well running on Buster.. and the last command directly above barfs as it can't find a version 2.16.1 of pikepdf.. See below:
pi@JMADS01:~/paperless-ng $ uname -a
Linux JMADS01 5.10.63-v7l+ #1496 SMP Wed Dec 1 15:58:56 GMT 2021 armv7l GNU/Linux
root@JMADS01:~# apt install python3-dev build-essential -y && pip install pikepdf==2.16.1 --force-reinstall
Reading package lists... Done
Building dependency tree... 50%
Building dependency tree
Reading state information... Done
build-essential is already the newest version (12.6).
python3-dev is already the newest version (3.7.3-1).
0 upgraded, 0 newly installed, 0 to remove and 2 not upgraded.
Looking in indexes: https://pypi.org/simple, https://www.piwheels.org/simple
Collecting pikepdf==2.16.1
Could not find a version that satisfies the requirement pikepdf==2.16.1 (from versions: 0.1rc1, 0.1rc3, 0.1rc4, 0.1rc5, 0.1.0.post1, 0.1.1, 0.1.2, 0.1.3)
No matching distribution found for pikepdf==2.16.1
Maybe I need to adjust my apt repos to something else?
Here's the error I'm seeing -- similar but slightly different .. but still pikepdf:
pikepdf's extension library failed to import : Traceback (most recent call last):
File "/opt/paperless/.local/lib/python3.7/site-packages/pikepdf/__init__.py", line 13, in <module>
from . import _qpdf
ImportError: /opt/paperless/.local/lib/python3.7/site-packages/pikepdf/_qpdf.cpython-37m-arm-linux-gnueabihf.so: undefined symbol: _ZTIN16QPDFObjectHandle15ParserCallbacksE
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/paperless/.local/lib/python3.7/site-packages/django_q/cluster.py", line 432, in worker
res = f(*task["args"], **task["kwargs"])
File "/opt/paperless/src/documents/tasks.py", line 81, in consume_file
task_id=task_id
File "/opt/paperless/src/documents/consumer.py", line 248, in try_consume_file
document_parser.parse(self.path, mime_type, self.filename)
File "/opt/paperless/src/paperless_tesseract/parsers.py", line 230, in parse
import ocrmypdf
File "/opt/paperless/.local/lib/python3.7/site-packages/ocrmypdf/__init__.py", line 10, in <module>
from ocrmypdf import helpers, hocrtransform, leptonica, pdfa, pdfinfo
File "/opt/paperless/.local/lib/python3.7/site-packages/ocrmypdf/helpers.py", line 22, in <module>
import pikepdf
File "/opt/paperless/.local/lib/python3.7/site-packages/pikepdf/__init__.py", line 16, in <module>
raise ImportError(_msg) from _e
ImportError: pikepdf's extension library failed to import
Has anyone built pikepdf from source? I just checked and I haven't but curious as to whether that might be helpful instead of pulling from a repo..
Yes all commands need to be run in the container. @quaintdev
@ps23Rick I can see that you are running the commands outside the container. Try running it inside the container.
@Darlekesh Thanks.
If I brought these containers down I believe I will have to run these commands again because the changes we made do not persist? This should be fixed in image, I think?
@quaintdev certainly so. Seeing how this issue has been open since August '21 I moved to official paperless-ng docker image (and docker-compose.yml). Runs well enough though I'm having trouble to OCR in language other than English.
Can confirm that this issue still exists when trying to run the image on RPI4B. Can also confirm that @jemelo solution works, you need to run the commands INSIDE the container:
I had this same issue, and after some testing and with the help of previous comments here, the following commands worked for me on rpi 4GB sudo apt-get install libxml2-dev libxslt-dev python-dev sudo apt-get install libjpeg-dev zlib1g-dev pip install wheel apt install python3-dev build-essential -y && pip install pikepdf==2.16.1 --force-reinstall
You then need to save changes made to the container into new image. For example:
docker ps -l
CONTAINER ID IMAGE
4bde937dfcc7 lscr.io/linuxserver/paperless-ng
sudo docker commit 4bde937dfcc7 paperless-ng_fixed
And then use this new fixed image in your docker-compose.yml instead of the original one until this issue is resolved.
I just tried to clone the repo and build the image on aarch64
machine following "Build Locally" guide
git clone https://github.com/linuxserver/docker-paperless-ng.git
cd docker-paperless-ng
docker build \
--no-cache \
--pull \
-f Dockerfile.aarch64 \
-t lscr.io/linuxserver/paperless-ng:latest .
Don't forget the -f Dockerfile.aarch64
And it works flawlessly.
When I tried build it on x86_64 machine with multiarch/qemu-user-static
It didn't work.
I managed to reproduce this. Can you test lsiodev/paperless-ng:1.5.0-pikepdf
to confirm?
I just tried to clone the repo and build the image on
aarch64
machine following "Build Locally" guidegit clone https://github.com/linuxserver/docker-paperless-ng.git cd docker-paperless-ng docker build \ --no-cache \ --pull \ -f Dockerfile.aarch64 \ -t lscr.io/linuxserver/paperless-ng:latest .
Don't forget the
-f Dockerfile.aarch64
And it works flawlessly.
I just tried this suggestion.. I deleted all of my existing docker images on my rpi4b after switching to the 64-bit kernel and all that junk..
When I run the above, it appears to go OK thru all 14 steps and leaves me with two docker images:
pi@JMADS01:~/paperless-ng/docker-paperless-ng $ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
<none> <none> d788193e8fe9 2 minutes ago 128MB
ghcr.io/linuxserver/baseimage-ubuntu arm64v8-focal ff377f4ead39 19 hours ago 128MB
I'm sure I'm missing something -- I don't regularly use docker and I'll admit it's been a while since I've used it. Maybe I'm just a bit dense today.. I might just ditch docker and go the bare-metal route. sorry for being OT;
Ok.. Just a quick followup.. Not sure if anyone else tried this.. I had read recently that the 64-bit kernel and OS was now out of beta for Raspberry Pi's.. So I backed up the bits on my Pi that needed backing up and installed Ubuntu 21 Server (fully 64-bit) and all is well for me now.. I was running a 64-bit kernel with 32-bit userland mostly for the past few weeks but it didn't help.. YMMV!
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
I think this issue should still opened until a fix is implemented in the docker image.
Roxedus asked for people to test a fix, but no-one answered, so the change didn't get merged.
Hey, I'm facing an issue that I suspect to be at least related to this one if not the same.
My Setup:
image: lscr.io/linuxserver/paperless-ng
container_name: paperless
environment:
- DOCKER_MODS=linuxserver/mods:papermerge-multilangocr
- PUID=1000
- PGID=1000
- PAPERLESS_FILENAME_FORMAT={created_year}/{correspondent}/{title}
- OCRLANG=deu
- PAPERLESS_OCR_LANGUAGE=deu
- PAPERLESS_TASK_WORKERS=2
- PAPERLESS_THREADS_PER_WORKER=1
volumes:
- /etc/localtime:/etc/localtime:ro
- ./paperless/config:/config
- ./paperless/data:/data
ports:
- 8000:8000```
What I am facing is that on upload of a file the whole process of processing the file stops because of an error.
The corresponding Log shows:
[2022-04-03 12:05:39,603] [INFO] [paperless.consumer] Consuming some.pdf
paperless | 12:05:40 [Q] INFO Process-1:7 stopped doing work
paperless | 12:05:40 [Q] ERROR Failed [some.pdf] - libxslt.so.1: cannot open shared object file: No such file or directory : Traceback (most recent call last):
paperless | File "/usr/local/lib/python3.8/dist-packages/django_q/cluster.py", line 432, in worker
paperless | res = f(*task["args"], **task["kwargs"])
paperless | File "/app/paperless/src/documents/tasks.py", line 70, in consume_file
paperless | document = Consumer().try_consume_file(
paperless | File "/app/paperless/src/documents/consumer.py", line 245, in try_consume_file
paperless | document_parser.parse(self.path, mime_type, self.filename)
paperless | File "/app/paperless/src/paperless_tesseract/parsers.py", line 237, in parse
paperless | import ocrmypdf
paperless | File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/__init__.py", line 10, in <module>
paperless | from ocrmypdf import helpers, hocrtransform, pdfa, pdfinfo
paperless | File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/helpers.py", line 23, in <module>
paperless | import pikepdf
paperless | File "/usr/local/lib/python3.8/dist-packages/pikepdf/__init__.py", line 55, in <module>
paperless | from .models import (
paperless | File "/usr/local/lib/python3.8/dist-packages/pikepdf/models/__init__.py", line 20, in <module>
paperless | from .metadata import PdfMetadata
paperless | File "/usr/local/lib/python3.8/dist-packages/pikepdf/models/metadata.py", line 29, in <module>
paperless | from lxml import etree
paperless | ImportError: libxslt.so.1: cannot open shared object file: No such file or directory
paperless |
paperless | 12:05:40 [Q] INFO recycled worker Process-1:7
paperless | 12:05:40 [Q] INFO Process-1:9 ready for work at 732
Now the reason why I suspect this phenomenon to be related is that the import of pikepdf is part of the fault-path (which can be seen in the given log). I was not able to find out whether a fix for the bug described in this issue was already distributed in newer version which thereby could have caused this different but similar error.
I tried resolving the problem the way @kurosch suggested but without any success.
@vemek @darkmattercoder
Could you try the following:
apt-get install libxml2-dev libxslt-dev python3-dev libjpeg-dev zlib1g-dev build-essential cython3 -y python3 -m pip install wheel python3 -m pip install lxml --force-reinstall python3 -m pip install Pillow --force-reinstall python3 -m pip install pikepdf --force-reinstall
I was getting similar issues on my raspberry pi 3. The above steps fixed it for me.
I did try as well to build the image locally but this did not work either due to other errors (not sure if relevant to this issue).
As I am trying for over a week now and I don't seem to find any solution, has anybody an Idea what else I could try to get the whole thing working?
Why does this issue keep getting comments, while noone tests pr?
Sorry that I missed to address you @Roxedus , at least in my fault scenario your approach seems not to solve my problem, I keep getting the following error:
paperless | 13:20:27 [Q] INFO Process-1:7 stopped doing work
paperless | 13:20:27 [Q] ERROR Failed [some.pdf] - libxslt.so.1: cannot open shared object file: No such file or directory : Traceback (most recent call last):
paperless | File "/usr/local/lib/python3.8/dist-packages/django_q/cluster.py", line 432, in worker
paperless | res = f(*task["args"], **task["kwargs"])
paperless | File "/app/paperless/src/documents/tasks.py", line 74, in consume_file
paperless | document = Consumer().try_consume_file(
paperless | File "/app/paperless/src/documents/consumer.py", line 248, in try_consume_file
paperless | document_parser.parse(self.path, mime_type, self.filename)
paperless | File "/app/paperless/src/paperless_tesseract/parsers.py", line 230, in parse
paperless | import ocrmypdf
paperless | File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/__init__.py", line 10, in <module>
paperless | from ocrmypdf import helpers, hocrtransform, leptonica, pdfa, pdfinfo
paperless | File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/helpers.py", line 22, in <module>
paperless | import pikepdf
paperless | File "/usr/local/lib/python3.8/dist-packages/pikepdf/__init__.py", line 50, in <module>
paperless | from .models import (
paperless | File "/usr/local/lib/python3.8/dist-packages/pikepdf/models/__init__.py", line 14, in <module>
paperless | from .metadata import PdfMetadata
paperless | File "/usr/local/lib/python3.8/dist-packages/pikepdf/models/metadata.py", line 28, in <module>
paperless | from lxml import etree
paperless | ImportError: libxslt.so.1: cannot open shared object file: No such file or directory
paperless |
paperless | 13:20:28 [Q] INFO recycled worker Process-1:7
paperless | 13:20:28 [Q] INFO Process-1:9 ready for work at 756
Although as I am not 100% sure if the error occurring to me is the same as to the others so it might be that it solves their problem.
Can you try to exec into the container and do an apk add libxslt
and try again?
@tobbenb apk seems not to exist in the container and I somewhat failed to install it.
Instead I used apt and tried to apt-get install libxslt-dev
(libxslt alone seemed not to exist).
Yet the process failed with:
Processing triggers for libc-bin (2.31-0ubuntu9.2) ...
/usr/sbin/ldconfig: 16: exec: /sbin/ldconfig.real: not found
/usr/sbin/ldconfig: 16: exec: /sbin/ldconfig.real: not found
dpkg: error processing package libc-bin (--configure):
installed libc-bin package post-installation script subprocess returned error exit status 127
Errors were encountered while processing:
libc-bin
E: Sub-process /usr/bin/dpkg returned an error code (1)
Sorry, i don't know why I thought it was using alpine as the base OS.
Try first apt update
then apt install libxslt1.1
Quick and happy update!
Thanks to @tobbenb it now seems to work seamlessly. Although I would note that after the apt update
the installation of libxslt1.1
was not successful until an apt upgrade
. In case this helps anyone in the future.
I'm going to try to figure out whether its just the re-installation of libxslt1.1 that fixes the problem or if it is the combination with the image provided by @Roxedus .
I managed to reproduce this. Can you test
lsiodev/paperless-ng:1.5.0-pikepdf
to confirm?
When I got any new information on this topic I'll attach it here! In the mean time thanks for the support guys :)
So after a few more tests it seems to work only on some files, on others not. I tried to find a common property on the files working and the files not working but I didn't find any correlation.
The error now occurring on some files is:
[2022-04-03 16:00:34,313] [INFO] [paperless.consumer] Consuming some.pdf
[2022-04-03 16:00:37,179] [ERROR] [paperless.consumer] Error while consuming document some.pdf: ImportError: cannot import name '_imagingcms' from 'PIL' (/usr/local/lib/python3.8/dist-packages/PIL/__init__.py)
Traceback (most recent call last):
File "/app/paperless/src/paperless_tesseract/parsers.py", line 241, in parse
ocrmypdf.ocr(**args)
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/api.py", line 340, in ocr
return run_pipeline(options=options, plugin_manager=plugin_manager, api=True)
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/_sync.py", line 359, in run_pipeline
pdfinfo = get_pdfinfo(
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/_pipeline.py", line 157, in get_pdfinfo
return PdfInfo(
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/pdfinfo/info.py", line 860, in __init__
self._pages = _pdf_pageinfo_concurrent(
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/pdfinfo/info.py", line 644, in _pdf_pageinfo_concurrent
executor(
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/_concurrent.py", line 82, in __call__
self._execute(
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/builtin_plugins/concurrency.py", line 132, in _execute
for result in results:
File "/usr/lib/python3.8/multiprocessing/pool.py", line 868, in next
raise value
File "/usr/lib/python3.8/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/pdfinfo/info.py", line 601, in _pdf_pageinfo_sync
page = PageInfo(pdf, pageno, infile, check_pages, detailed_analysis)
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/pdfinfo/info.py", line 675, in __init__
self._gather_pageinfo(pdf, pageno, infile, check_pages, detailed_analysis)
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/pdfinfo/info.py", line 721, in _gather_pageinfo
for ci in _process_content_streams(
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/pdfinfo/info.py", line 528, in _process_content_streams
yield from _find_regular_images(container, contentsinfo)
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/pdfinfo/info.py", line 446, in _find_regular_images
yield ImageInfo(name=draw.name, pdfimage=pdfimage, shorthand=draw.shorthand)
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/pdfinfo/info.py", line 319, in __init__
pim_icc = pim.icc
File "/usr/local/lib/python3.8/dist-packages/pikepdf/models/image.py", line 394, in icc
self._icc = ImageCms.ImageCmsProfile(iccbytesio)
File "/usr/local/lib/python3.8/dist-packages/PIL/ImageCms.py", line 172, in __init__
self._set(core.profile_frombytes(profile.read()))
File "/usr/local/lib/python3.8/dist-packages/PIL/_util.py", line 19, in __getattr__
raise self.ex
File "/usr/local/lib/python3.8/dist-packages/PIL/ImageCms.py", line 23, in <module>
from PIL import _imagingcms
ImportError: cannot import name '_imagingcms' from 'PIL' (/usr/local/lib/python3.8/dist-packages/PIL/__init__.py)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/app/paperless/src/documents/consumer.py", line 248, in try_consume_file
document_parser.parse(self.path, mime_type, self.filename)
File "/app/paperless/src/paperless_tesseract/parsers.py", line 290, in parse
raise ParseError(f"{e.__class__.__name__}: {str(e)}")
documents.parsers.ParseError: ImportError: cannot import name '_imagingcms' from 'PIL' (/usr/local/lib/python3.8/dist-packages/PIL/__init__.py)
16:00:37 [Q] INFO Process-1:28 stopped doing work
16:00:37 [Q] ERROR Failed [some.pdf] - some.pdf: Error while consuming document some.pdf: ImportError: cannot import name '_imagingcms' from 'PIL' (/usr/local/lib/python3.8/dist-packages/PIL/__init__.py) : Traceback (most recent call last):
File "/app/paperless/src/paperless_tesseract/parsers.py", line 241, in parse
ocrmypdf.ocr(**args)
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/api.py", line 340, in ocr
return run_pipeline(options=options, plugin_manager=plugin_manager, api=True)
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/_sync.py", line 359, in run_pipeline
pdfinfo = get_pdfinfo(
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/_pipeline.py", line 157, in get_pdfinfo
return PdfInfo(
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/pdfinfo/info.py", line 860, in __init__
self._pages = _pdf_pageinfo_concurrent(
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/pdfinfo/info.py", line 644, in _pdf_pageinfo_concurrent
executor(
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/_concurrent.py", line 82, in __call__
self._execute(
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/builtin_plugins/concurrency.py", line 132, in _execute
for result in results:
File "/usr/lib/python3.8/multiprocessing/pool.py", line 868, in next
raise value
File "/usr/lib/python3.8/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/pdfinfo/info.py", line 601, in _pdf_pageinfo_sync
page = PageInfo(pdf, pageno, infile, check_pages, detailed_analysis)
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/pdfinfo/info.py", line 675, in __init__
self._gather_pageinfo(pdf, pageno, infile, check_pages, detailed_analysis)
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/pdfinfo/info.py", line 721, in _gather_pageinfo
for ci in _process_content_streams(
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/pdfinfo/info.py", line 528, in _process_content_streams
yield from _find_regular_images(container, contentsinfo)
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/pdfinfo/info.py", line 446, in _find_regular_images
yield ImageInfo(name=draw.name, pdfimage=pdfimage, shorthand=draw.shorthand)
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/pdfinfo/info.py", line 319, in __init__
pim_icc = pim.icc
File "/usr/local/lib/python3.8/dist-packages/pikepdf/models/image.py", line 394, in icc
self._icc = ImageCms.ImageCmsProfile(iccbytesio)
File "/usr/local/lib/python3.8/dist-packages/PIL/ImageCms.py", line 172, in __init__
self._set(core.profile_frombytes(profile.read()))
File "/usr/local/lib/python3.8/dist-packages/PIL/_util.py", line 19, in __getattr__
raise self.ex
File "/usr/local/lib/python3.8/dist-packages/PIL/ImageCms.py", line 23, in <module>
from PIL import _imagingcms
ImportError: cannot import name '_imagingcms' from 'PIL' (/usr/local/lib/python3.8/dist-packages/PIL/__init__.py)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/asgiref/sync.py", line 288, in main_wrap
raise exc_info[1]
File "/app/paperless/src/documents/consumer.py", line 248, in try_consume_file
document_parser.parse(self.path, mime_type, self.filename)
File "/app/paperless/src/paperless_tesseract/parsers.py", line 290, in parse
raise ParseError(f"{e.__class__.__name__}: {str(e)}")
documents.parsers.ParseError: ImportError: cannot import name '_imagingcms' from 'PIL' (/usr/local/lib/python3.8/dist-packages/PIL/__init__.py)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/django_q/cluster.py", line 432, in worker
res = f(*task["args"], **task["kwargs"])
File "/app/paperless/src/documents/tasks.py", line 74, in consume_file
document = Consumer().try_consume_file(
File "/app/paperless/src/documents/consumer.py", line 266, in try_consume_file
self._fail(
File "/app/paperless/src/documents/consumer.py", line 70, in _fail
raise ConsumerError(f"{self.filename}: {log_message or message}")
documents.consumer.ConsumerError: some.pdf: Error while consuming document some.pdf: ImportError: cannot import name '_imagingcms' from 'PIL' (/usr/local/lib/python3.8/dist-packages/PIL/__init__.py)
16:00:37 [Q] INFO recycled worker Process-1:28
16:00:37 [Q] INFO Process-1:30 ready for work at 3007
Besides this probably being a different issue, i find it somewhat interesting other exceptions ocuring during exceptionhandling. I guess that should not be.
@Roxedus I am tested it and it doesn't work, still. The error is identical to @AnomalieXB-6783746.
After some more research it seems like this could be a bug in papaerless-ng itself Issue.
Still having this issue. I opened a dupe issue, which was closed.
Consider GitHub.com/Linux-Server/docker-paperless-ngx as alternative. I think the maintenance effort for Ng is unnecessary because ngx has been established
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Same issue here with the latest image on docker running with portainer.
Here's the command docker pull linuxserver/paperless-ng
to confirm.
Using default tag: latest latest: Pulling from linuxserver/paperless-ng Digest: sha256:c5d2a1006be929edb9098532e0fee68f9588ff265e651fb7d1ab78c9ac350426 Status: Image is up to date for linuxserver/paperless-ng:latest docker.io/linuxserver/paperless-ng:latest
@jemelo's set of commands worked for me after accessing the container with docker exec -it paperless su
sudo apt-get install libxml2-dev libxslt-dev python-dev sudo apt-get install libjpeg-dev zlib1g-dev pip install wheel apt install python3-dev build-essential -y && pip install pikepdf==2.16.1 --force-reinstall
Running on a Raspberry Pi 4B 4GB RAM with Raspberry Pi OS (64-bit).
Same issue here with the latest image on docker running with portainer. Here's the command
docker pull linuxserver/paperless-ng
to confirm.
Using default tag: latest latest: Pulling from linuxserver/paperless-ng Digest: sha256:c5d2a1006be929edb9098532e0fee68f9588ff265e651fb7d1ab78c9ac350426 Status: Image is up to date for linuxserver/paperless-ng:latest docker.io/linuxserver/paperless-ng:latest
@jemelo's set of commands worked for me after accessing the container with
docker exec -it paperless su
sudo apt-get install libxml2-dev libxslt-dev python-dev sudo apt-get install libjpeg-dev zlib1g-dev pip install wheel apt install python3-dev build-essential -y && pip install pikepdf==2.16.1 --force-reinstall
Running on a Raspberry Pi 4B 4GB RAM with Raspberry Pi OS (64-bit).
works for me, thx
Expected Behavior
Uploading a document into the paperless-ng web application should trigger an OCR process and the file should be available afterwards.
Current Behavior
The upload process stops at 'Upload complete, waiting...' and nothing happens. The web application itself is fully functional, also the Django admin backend. Just the upload process seems not to work at all.
Steps to Reproduce
Environment
OS: Debian GNU/Linux 9 (stretch) - Raspberry Pi 4
CPU architecture: arm32
How docker service was installed: Via portainer, official repo from DockerHub
Docker logs