linuxserver / docker-paperless-ng

GNU General Public License v3.0
34 stars 7 forks source link

ImportError: pikepdf's extension library failed to import #8

Closed l4rm4nd closed 2 years ago

l4rm4nd commented 3 years ago

linuxserver.io


Expected Behavior

Uploading a document into the paperless-ng web application should trigger an OCR process and the file should be available afterwards.

Current Behavior

The upload process stops at 'Upload complete, waiting...' and nothing happens. The web application itself is fully functional, also the Django admin backend. Just the upload process seems not to work at all.

Steps to Reproduce

  1. Log into the paperless-ng web application
  2. Upload a sample pdf file using the "drop documents here" or "browse file" button
  3. Observe that the file is not successfully uploaded or processed

Environment

OS: Debian GNU/Linux 9 (stretch) - Raspberry Pi 4

CPU architecture: arm32

How docker service was installed: Via portainer, official repo from DockerHub

Docker logs

19:28:16 [Q] INFO recycled worker Process-1:1
19:28:16 [Q] INFO Process-1:5 ready for work at 381
19:28:19 [Q] INFO Enqueued 1
19:28:19 [Q] INFO Process-1:2 processing [example.pdf]
[pid: 347|app: 0|req: 7/7] 192.168.178.7 () {42 vars in 1669 bytes} [Fri Aug 27 19:28:19 2021] POST /api/documents/post_document/ => generated 4 bytes in 33 msecs (HTTP/1.1 200) 10 headers in 292 bytes (2 switches on core 0)
[2021-08-27 19:28:19,256] [INFO] [paperless.consumer] Consuming example.pdf
19:28:19 [Q] INFO Process-1:2 stopped doing work
19:28:19 [Q] ERROR Failed [example.pdf] - pikepdf's extension library failed to import : Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/pikepdf/__init__.py", line 13, in <module>
    from . import _qpdf
ImportError: /usr/local/lib/python3.8/dist-packages/pikepdf/_qpdf.cpython-38-arm-linux-gnueabihf.so: undefined symbol: _ZN20QPDFPageObjectHelper16placeFormXObjectE16QPDFObjectHandleRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS0_9RectangleEbbb
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/django_q/cluster.py", line 432, in worker
    res = f(*task["args"], **task["kwargs"])
  File "/app/paperless/src/documents/tasks.py", line 74, in consume_file
    document = Consumer().try_consume_file(
  File "/app/paperless/src/documents/consumer.py", line 248, in try_consume_file
    document_parser.parse(self.path, mime_type, self.filename)
  File "/app/paperless/src/paperless_tesseract/parsers.py", line 230, in parse
    import ocrmypdf
  File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/__init__.py", line 10, in <module>
    from ocrmypdf import helpers, hocrtransform, leptonica, pdfa, pdfinfo
  File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/helpers.py", line 22, in <module>
    import pikepdf
  File "/usr/local/lib/python3.8/dist-packages/pikepdf/__init__.py", line 16, in <module>
    raise ImportError(_msg) from _e
ImportError: pikepdf's extension library failed to import
github-actions[bot] commented 3 years ago

Thanks for opening your first issue here! Be sure to follow the bug or feature issue templates!

henrygd commented 3 years ago

I'm experiencing the same issue on Ubuntu 20.04.3 (aarch64)

garret commented 3 years ago

I am also having the same issue. Using this image on a raspberry pi 4B (4GB) with default raspberry pi os (32bit).

zahnp4sta commented 3 years ago

same on raspberry 4b 8GB with Ubuntu 21.04 64bit.

Darlekesh commented 3 years ago

I had the same issue on Ubuntu 20.04.1 aarch64 And I fixed it with connecting to the container and running this command apt update && apt install python3-dev build-essential -y && pip install pikepdf==2.16.1 --force-reinstall works also with newer version of pikepdf but I'm trying to be on the same version as provided. Edit: verison of paperless-ng version-ng-1.5.0

zahnp4sta commented 3 years ago

@Darlekesh yup, that worked. thank you!

henrygd commented 3 years ago

Might want to leave this issue open until it's fixed in the actual image

ficusdragoneta commented 3 years ago

@Darlekesh Unfortunately that did not work for me on Raspberry Pi 4B (4GB) with default Raspberry Pi OS (32bit).

garret commented 3 years ago

Since it has been taking so much time to resolve this issue, I reverted to the paperless-ng docker image offered by the official project. They luckily have an image for the rpi.

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

vemek commented 3 years ago

This is still an issue. Building pikepdf wheel does not work on Docker for Mac (M1, aarch64):

Collecting pikepdf==2.16.1
  Downloading pikepdf-2.16.1.tar.gz (2.3 MB)
     |████████████████████████████████| 2.3 MB 1.7 MB/s
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Collecting Pillow>=6.0
  Downloading Pillow-8.4.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (3.0 MB)
     |████████████████████████████████| 3.0 MB 37.0 MB/s
Collecting lxml>=4.0
  Downloading lxml-4.6.4-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.manylinux_2_24_aarch64.whl (6.5 MB)
     |████████████████████████████████| 6.5 MB 6.7 MB/s
Building wheels for collected packages: pikepdf
  Building wheel for pikepdf (pyproject.toml) ... error
  ERROR: Command errored out with exit status 1:
   command: /usr/bin/python3 /usr/local/lib/python3.8/dist-packages/pip/_vendor/pep517/in_process/_in_process.py build_wheel /tmp/tmpfayft38f
       cwd: /tmp/pip-install-0zsq3vrr/pikepdf_1d6b213d99e149d1a2c51e6a7d22ff7a
  Complete output (46 lines):
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build/lib.linux-aarch64-3.8
  creating build/lib.linux-aarch64-3.8/pikepdf
  copying src/pikepdf/_cpphelpers.py -> build/lib.linux-aarch64-3.8/pikepdf
  copying src/pikepdf/_version.py -> build/lib.linux-aarch64-3.8/pikepdf
  copying src/pikepdf/jbig2.py -> build/lib.linux-aarch64-3.8/pikepdf
  copying src/pikepdf/objects.py -> build/lib.linux-aarch64-3.8/pikepdf
  copying src/pikepdf/codec.py -> build/lib.linux-aarch64-3.8/pikepdf
  copying src/pikepdf/_methods.py -> build/lib.linux-aarch64-3.8/pikepdf
  copying src/pikepdf/_xml.py -> build/lib.linux-aarch64-3.8/pikepdf
  copying src/pikepdf/__init__.py -> build/lib.linux-aarch64-3.8/pikepdf
  creating build/lib.linux-aarch64-3.8/pikepdf/models
  copying src/pikepdf/models/encryption.py -> build/lib.linux-aarch64-3.8/pikepdf/models
  copying src/pikepdf/models/outlines.py -> build/lib.linux-aarch64-3.8/pikepdf/models
  copying src/pikepdf/models/metadata.py -> build/lib.linux-aarch64-3.8/pikepdf/models
  copying src/pikepdf/models/image.py -> build/lib.linux-aarch64-3.8/pikepdf/models
  copying src/pikepdf/models/matrix.py -> build/lib.linux-aarch64-3.8/pikepdf/models
  copying src/pikepdf/models/__init__.py -> build/lib.linux-aarch64-3.8/pikepdf/models
  running egg_info
  writing src/pikepdf.egg-info/PKG-INFO
  writing dependency_links to src/pikepdf.egg-info/dependency_links.txt
  writing requirements to src/pikepdf.egg-info/requires.txt
  writing top-level names to src/pikepdf.egg-info/top_level.txt
  reading manifest file 'src/pikepdf.egg-info/SOURCES.txt'
  adding license file 'LICENSE.txt'
  adding license file 'licenses/license.wheel.txt'
  writing manifest file 'src/pikepdf.egg-info/SOURCES.txt'
  copying src/pikepdf/_qpdf.pyi -> build/lib.linux-aarch64-3.8/pikepdf
  copying src/pikepdf/py.typed -> build/lib.linux-aarch64-3.8/pikepdf
  running build_ext
  building 'pikepdf._qpdf' extension
  creating build/temp.linux-aarch64-3.8
  creating build/temp.linux-aarch64-3.8/src
  creating build/temp.linux-aarch64-3.8/src/qpdf
  aarch64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/tmp/pip-build-env-1ierem4r/overlay/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c src/qpdf/object.cpp -o build/temp.linux-aarch64-3.8/src/qpdf/object.o -fvisibility=hidden -g0 -std=c++14
  aarch64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/tmp/pip-build-env-1ierem4r/overlay/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c src/qpdf/annotation.cpp -o build/temp.linux-aarch64-3.8/src/qpdf/annotation.o -fvisibility=hidden -g0 -std=c++14
  aarch64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/tmp/pip-build-env-1ierem4r/overlay/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c src/qpdf/object_convert.cpp -o build/temp.linux-aarch64-3.8/src/qpdf/object_convert.o -fvisibility=hidden -g0 -std=c++14
  aarch64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/tmp/pip-build-env-1ierem4r/overlay/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c src/qpdf/object_repr.cpp -o build/temp.linux-aarch64-3.8/src/qpdf/object_repr.o -fvisibility=hidden -g0 -std=c++14
  aarch64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/tmp/pip-build-env-1ierem4r/overlay/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c src/qpdf/page.cpp -o build/temp.linux-aarch64-3.8/src/qpdf/page.o -fvisibility=hidden -g0 -std=c++14
  aarch64-linux-gnu-gcc: fatal error: Killed signal terminated program cc1plus
  compilation terminated.
  aarch64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/tmp/pip-build-env-1ierem4r/overlay/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c src/qpdf/parsers.cpp -o build/temp.linux-aarch64-3.8/src/qpdf/parsers.o -fvisibility=hidden -g0 -std=c++14
  error: command 'aarch64-linux-gnu-gcc' failed with exit status 1
  ----------------------------------------
  ERROR: Failed building wheel for pikepdf
Failed to build pikepdf
ERROR: Could not build wheels for pikepdf, which is required to install pyproject.toml-based projects
pblgomez commented 2 years ago

Same error here still

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

RefineryX commented 2 years ago

This is still an issue, tried installing last night on Raspberry Pi 4 4gb.

rmiddle commented 2 years ago

Also seeing the same issue. It looks like a task fails due to memory issue on first boot. Just relaunch again fresh, and it looks like it didn't crash on it first boot. (I removed all storage before restarting)

07:21:01 [Q] INFO Process-1:8 ready for work at 428 [2021-12-25 07:22:27 +0000] [364] [CRITICAL] WORKER TIMEOUT (pid:394) [2021-12-25 07:22:27 +0000] [364] [WARNING] Worker with pid 394 was terminated due to signal 6 07:23:53 [Q] INFO Enqueued 1

Attempting to import an .pdf and getting errors.

07:24:01 [Q] INFO Process-1:7 stopped doing work 07:24:01 [Q] ERROR Failed [2020_TaxReturn.pdf] - pikepdf's extension library failed to import : Traceback (most recent call last): File "/usr/local/lib/python3.8/dist-packages/pikepdf/init.py", line 13, in from . import _qpdf ImportError: /usr/local/lib/python3.8/dist-packages/pikepdf/_qpdf.cpython-38-aarch64-linux-gnu.so: undefined symbol: _ZN20QPDFPageObjectHelper16placeFormXObjectE16QPDFObjectHandleRKNSt7cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS0_9RectangleEbbb The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/local/lib/python3.8/dist-packages/django_q/cluster.py", line 432, in worker res = f(*task["args"], **task["kwargs"]) File "/app/paperless/src/documents/tasks.py", line 74, in consume_file document = Consumer().try_consume_file( File "/app/paperless/src/documents/consumer.py", line 248, in try_consume_file document_parser.parse(self.path, mime_type, self.filename) File "/app/paperless/src/paperless_tesseract/parsers.py", line 230, in parse import ocrmypdf File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/init.py", line 10, in from ocrmypdf import helpers, hocrtransform, leptonica, pdfa, pdfinfo File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/helpers.py", line 22, in import pikepdf File "/usr/local/lib/python3.8/dist-packages/pikepdf/init.py", line 16, in raise ImportError(_msg) from _e ImportError: pikepdf's extension library failed to import 07:24:01 [Q] INFO recycled worker Process-1:7 07:24:01 [Q] INFO Process-1:9 ready for work at 457 07:25:07 [Q] INFO Process-1:8 processing [bulldog-blossom-chicken-may] 07:25:07 [Q] INFO Enqueued 1 [2021-12-25 07:25:07,541] [INFO] [paperless.consumer] Consuming 2020_TaxReturn.pdf 07:25:15 [Q] INFO Process-1:8 stopped doing work 07:25:15 [Q] ERROR Failed [bulldog-blossom-chicken-may] - pikepdf's extension library failed to import : Traceback (most recent call last): File "/usr/local/lib/python3.8/dist-packages/pikepdf/init.py", line 13, in from . import _qpdf ImportError: /usr/local/lib/python3.8/dist-packages/pikepdf/_qpdf.cpython-38-aarch64-linux-gnu.so: undefined symbol: _ZN20QPDFPageObjectHelper16placeFormXObjectE16QPDFObjectHandleRKNSt7cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS0_9RectangleEbbb The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/local/lib/python3.8/dist-packages/django_q/cluster.py", line 432, in worker res = f(*task["args"], **task["kwargs"]) File "/app/paperless/src/documents/tasks.py", line 74, in consume_file document = Consumer().try_consume_file( File "/app/paperless/src/documents/consumer.py", line 248, in try_consume_file document_parser.parse(self.path, mime_type, self.filename) File "/app/paperless/src/paperless_tesseract/parsers.py", line 230, in parse import ocrmypdf File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/init.py", line 10, in from ocrmypdf import helpers, hocrtransform, leptonica, pdfa, pdfinfo File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/helpers.py", line 22, in import pikepdf File "/usr/local/lib/python3.8/dist-packages/pikepdf/init.py", line 16, in raise ImportError(_msg) from _e ImportError: pikepdf's extension library failed to import

akashkj commented 2 years ago

getting issue on arm64 as of today

2021-12-26T17:50:20.036147514Z ImportError: pikepdf's extension library failed to import
2021-12-26T17:50:20.036145274Z     raise ImportError(_msg) from _e
2021-12-26T17:50:20.036149874Z 
2021-12-26T17:50:20.023669087Z 17:50:20 [Q] INFO Process-1:12 stopped doing work
2021-12-26T17:50:20.036142674Z   File "/usr/local/lib/python3.8/dist-packages/pikepdf/__init__.py", line 16, in <module>
2021-12-26T17:50:20.036137714Z   File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/helpers.py", line 22, in <module>
2021-12-26T17:50:20.036140474Z     import pikepdf
2021-12-26T17:50:20.036135234Z     from ocrmypdf import helpers, hocrtransform, leptonica, pdfa, pdfinfo
2021-12-26T17:50:20.036132634Z   File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/__init__.py", line 10, in <module>
2021-12-26T17:50:20.036127834Z   File "/app/paperless/src/paperless_tesseract/parsers.py", line 230, in parse
2021-12-26T17:50:20.036130314Z     import ocrmypdf
2021-12-26T17:50:20.036125394Z     document_parser.parse(self.path, mime_type, self.filename)
2021-12-26T17:50:20.036122834Z   File "/app/paperless/src/documents/consumer.py", line 248, in try_consume_file
2021-12-26T17:50:20.036120474Z     document = Consumer().try_consume_file(
2021-12-26T17:50:20.036110554Z Traceback (most recent call last):
2021-12-26T17:50:20.036112954Z   File "/usr/local/lib/python3.8/dist-packages/django_q/cluster.py", line 432, in worker
2021-12-26T17:50:20.036115514Z     res = f(*task["args"], **task["kwargs"])
2021-12-26T17:50:20.036117994Z   File "/app/paperless/src/documents/tasks.py", line 74, in consume_file
HydrelioxGitHub commented 2 years ago

Same issue here :

18:05:05 [Q] INFO recycled worker Process-1:1

18:05:05 [Q] INFO Process-1:5 ready for work at 453

[2021-12-26 18:06:56,329] [INFO] [paperless.management.consumer] Adding /data/consume/201905.pdf to the task queue.

18:06:56 [Q] INFO Enqueued 1

18:06:56 [Q] INFO Process-1:2 processing [201905.pdf]

[2021-12-26 18:06:56,630] [INFO] [paperless.consumer] Consuming 201905.pdf

18:06:57 [Q] INFO Process-1:2 stopped doing work

18:06:57 [Q] ERROR Failed [201905.pdf] - pikepdf's extension library failed to import : Traceback (most recent call last):

  File "/usr/local/lib/python3.8/dist-packages/pikepdf/__init__.py", line 13, in <module>

    from . import _qpdf

ImportError: /usr/local/lib/python3.8/dist-packages/pikepdf/_qpdf.cpython-38-arm-linux-gnueabihf.so: undefined symbol: _ZN20QPDFPageObjectHelper16placeFormXObjectE16QPDFObjectHandleRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS0_9RectangleEbbb

The above exception was the direct cause of the following exception:

Traceback (most recent call last):

  File "/usr/local/lib/python3.8/dist-packages/django_q/cluster.py", line 432, in worker

    res = f(*task["args"], **task["kwargs"])

  File "/app/paperless/src/documents/tasks.py", line 74, in consume_file

    document = Consumer().try_consume_file(

  File "/app/paperless/src/documents/consumer.py", line 248, in try_consume_file

    document_parser.parse(self.path, mime_type, self.filename)

  File "/app/paperless/src/paperless_tesseract/parsers.py", line 230, in parse

    import ocrmypdf

  File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/__init__.py", line 10, in <module>

    from ocrmypdf import helpers, hocrtransform, leptonica, pdfa, pdfinfo

  File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/helpers.py", line 22, in <module>

    import pikepdf

  File "/usr/local/lib/python3.8/dist-packages/pikepdf/__init__.py", line 16, in <module>

    raise ImportError(_msg) from _e

ImportError: pikepdf's extension library failed to import
jemelo commented 2 years ago

I had this same issue, and after some testing and with the help of previous comments here, the following commands worked for me on rpi 4GB sudo apt-get install libxml2-dev libxslt-dev python-dev sudo apt-get install libjpeg-dev zlib1g-dev pip install wheel apt install python3-dev build-essential -y && pip install pikepdf==2.16.1 --force-reinstall

darkmattercoder commented 2 years ago

As @vemek stated, it is not building on Apple Silicon (M1).

kurosch commented 2 years ago

@vemek @darkmattercoder

Could you try the following:

apt-get install libxml2-dev libxslt-dev python3-dev libjpeg-dev zlib1g-dev build-essential cython3 -y
python3 -m pip install wheel
python3 -m pip install lxml --force-reinstall
python3 -m pip install Pillow --force-reinstall
python3 -m pip install pikepdf --force-reinstall

I was getting similar issues on my raspberry pi 3. The above steps fixed it for me.

jemelo commented 2 years ago

I just tried to import another pdf scanned from switft scanner iOS app and got an error I cant solve on mu RPI4

[2022-01-21 20:42:26,286] [ERROR] [paperless.consumer] Error while consuming document Consulta otorrino António.pdf: ImportError: cannot import name '_imagingcms' from 'PIL' (/usr/local/lib/python3.8/dist-packages/PIL/init.py)

Traceback (most recent call last):

File "/app/paperless/src/paperless_tesseract/parsers.py", line 241, in parse

ocrmypdf.ocr(**args)

File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/api.py", line 340, in ocr

return run_pipeline(options=options, plugin_manager=plugin_manager, api=True)

File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/_sync.py", line 359, in run_pipeline

pdfinfo = get_pdfinfo(

File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/_pipeline.py", line 157, in get_pdfinfo

return PdfInfo(

File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/pdfinfo/info.py", line 860, in init

self._pages = _pdf_pageinfo_concurrent(

File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/pdfinfo/info.py", line 644, in _pdf_pageinfo_concurrent

executor(

File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/_concurrent.py", line 82, in call

self._execute(

File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/builtin_plugins/concurrency.py", line 132, in _execute

for result in results:

File "/usr/lib/python3.8/multiprocessing/pool.py", line 868, in next

raise value

File "/usr/lib/python3.8/multiprocessing/pool.py", line 125, in worker

result = (True, func(*args, **kwds))

File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/pdfinfo/info.py", line 601, in _pdf_pageinfo_sync

page = PageInfo(pdf, pageno, infile, check_pages, detailed_analysis)

File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/pdfinfo/info.py", line 675, in init

self._gather_pageinfo(pdf, pageno, infile, check_pages, detailed_analysis)

File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/pdfinfo/info.py", line 721, in _gather_pageinfo

for ci in _process_content_streams(

File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/pdfinfo/info.py", line 528, in _process_content_streams

yield from _find_regular_images(container, contentsinfo)

File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/pdfinfo/info.py", line 446, in _find_regular_images

yield ImageInfo(name=draw.name, pdfimage=pdfimage, shorthand=draw.shorthand)

File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/pdfinfo/info.py", line 319, in init

pim_icc = pim.icc

File "/usr/local/lib/python3.8/dist-packages/pikepdf/models/image.py", line 480, in icc

self._icc = ImageCms.ImageCmsProfile(iccbytesio)

File "/usr/local/lib/python3.8/dist-packages/PIL/ImageCms.py", line 172, in init

self._set(core.profile_frombytes(profile.read()))

File "/usr/local/lib/python3.8/dist-packages/PIL/_util.py", line 19, in getattr

raise self.ex

File "/usr/local/lib/python3.8/dist-packages/PIL/ImageCms.py", line 23, in

from PIL import _imagingcms

ImportError: cannot import name '_imagingcms' from 'PIL' (/usr/local/lib/python3.8/dist-packages/PIL/init.py)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File "/app/paperless/src/documents/consumer.py", line 248, in try_consume_file

document_parser.parse(self.path, mime_type, self.filename)

File "/app/paperless/src/paperless_tesseract/parsers.py", line 290, in parse

raise ParseError(f"{e.__class__.__name__}: {str(e)}")

documents.parsers.ParseError: ImportError: cannot import name '_imagingcms' from 'PIL' (/usr/local/lib/python3.8/dist-packages/PIL/init.py)

quaintdev commented 2 years ago

I am facing this issue.

@kurosch the commands need to be run inside paperless container? Please confirm.

Mchl commented 2 years ago

I had this same issue, and after some testing and with the help of previous comments here, the following commands worked for me on rpi 4GB

sudo apt-get install libxml2-dev libxslt-dev python-dev sudo apt-get install libjpeg-dev zlib1g-dev pip install wheel apt install python3-dev build-essential -y && pip install pikepdf==2.16.1 --force-reinstall

Confirming this worked for me on RPi 4B with Raspberry OS 32b

ps23Rick commented 2 years ago

I'm on an RPI4 as well running on Buster.. and the last command directly above barfs as it can't find a version 2.16.1 of pikepdf.. See below:

pi@JMADS01:~/paperless-ng $ uname -a
Linux JMADS01 5.10.63-v7l+ #1496 SMP Wed Dec 1 15:58:56 GMT 2021 armv7l GNU/Linux

root@JMADS01:~# apt install python3-dev build-essential -y && pip install pikepdf==2.16.1 --force-reinstall
Reading package lists... Done
Building dependency tree... 50%

Building dependency tree
Reading state information... Done
build-essential is already the newest version (12.6).
python3-dev is already the newest version (3.7.3-1).
0 upgraded, 0 newly installed, 0 to remove and 2 not upgraded.
Looking in indexes: https://pypi.org/simple, https://www.piwheels.org/simple
Collecting pikepdf==2.16.1
  Could not find a version that satisfies the requirement pikepdf==2.16.1 (from versions: 0.1rc1, 0.1rc3, 0.1rc4, 0.1rc5, 0.1.0.post1, 0.1.1, 0.1.2, 0.1.3)
No matching distribution found for pikepdf==2.16.1

Maybe I need to adjust my apt repos to something else?

Here's the error I'm seeing -- similar but slightly different .. but still pikepdf:

pikepdf's extension library failed to import : Traceback (most recent call last):
File "/opt/paperless/.local/lib/python3.7/site-packages/pikepdf/__init__.py", line 13, in <module>
from . import _qpdf
ImportError: /opt/paperless/.local/lib/python3.7/site-packages/pikepdf/_qpdf.cpython-37m-arm-linux-gnueabihf.so: undefined symbol: _ZTIN16QPDFObjectHandle15ParserCallbacksE

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/opt/paperless/.local/lib/python3.7/site-packages/django_q/cluster.py", line 432, in worker
res = f(*task["args"], **task["kwargs"])
File "/opt/paperless/src/documents/tasks.py", line 81, in consume_file
task_id=task_id
File "/opt/paperless/src/documents/consumer.py", line 248, in try_consume_file
document_parser.parse(self.path, mime_type, self.filename)
File "/opt/paperless/src/paperless_tesseract/parsers.py", line 230, in parse
import ocrmypdf
File "/opt/paperless/.local/lib/python3.7/site-packages/ocrmypdf/__init__.py", line 10, in <module>
from ocrmypdf import helpers, hocrtransform, leptonica, pdfa, pdfinfo
File "/opt/paperless/.local/lib/python3.7/site-packages/ocrmypdf/helpers.py", line 22, in <module>
import pikepdf
File "/opt/paperless/.local/lib/python3.7/site-packages/pikepdf/__init__.py", line 16, in <module>
raise ImportError(_msg) from _e
ImportError: pikepdf's extension library failed to import

Has anyone built pikepdf from source? I just checked and I haven't but curious as to whether that might be helpful instead of pulling from a repo..

Darlekesh commented 2 years ago

Yes all commands need to be run in the container. @quaintdev

@ps23Rick I can see that you are running the commands outside the container. Try running it inside the container.

quaintdev commented 2 years ago

@Darlekesh Thanks.

If I brought these containers down I believe I will have to run these commands again because the changes we made do not persist? This should be fixed in image, I think?

Mchl commented 2 years ago

@quaintdev certainly so. Seeing how this issue has been open since August '21 I moved to official paperless-ng docker image (and docker-compose.yml). Runs well enough though I'm having trouble to OCR in language other than English.

Master-Yoba commented 2 years ago

Can confirm that this issue still exists when trying to run the image on RPI4B. Can also confirm that @jemelo solution works, you need to run the commands INSIDE the container:

I had this same issue, and after some testing and with the help of previous comments here, the following commands worked for me on rpi 4GB sudo apt-get install libxml2-dev libxslt-dev python-dev sudo apt-get install libjpeg-dev zlib1g-dev pip install wheel apt install python3-dev build-essential -y && pip install pikepdf==2.16.1 --force-reinstall

You then need to save changes made to the container into new image. For example:

docker ps -l
CONTAINER ID   IMAGE                              
4bde937dfcc7   lscr.io/linuxserver/paperless-ng

sudo docker commit 4bde937dfcc7 paperless-ng_fixed

And then use this new fixed image in your docker-compose.yml instead of the original one until this issue is resolved.

Darlekesh commented 2 years ago

I just tried to clone the repo and build the image on aarch64 machine following "Build Locally" guide

git clone https://github.com/linuxserver/docker-paperless-ng.git
cd docker-paperless-ng
docker build \
  --no-cache \
  --pull \
  -f Dockerfile.aarch64 \
  -t lscr.io/linuxserver/paperless-ng:latest .

Don't forget the -f Dockerfile.aarch64

And it works flawlessly.

When I tried build it on x86_64 machine with multiarch/qemu-user-static It didn't work.

Roxedus commented 2 years ago

I managed to reproduce this. Can you test lsiodev/paperless-ng:1.5.0-pikepdf to confirm?

ps23Rick commented 2 years ago

I just tried to clone the repo and build the image on aarch64 machine following "Build Locally" guide

git clone https://github.com/linuxserver/docker-paperless-ng.git
cd docker-paperless-ng
docker build \
  --no-cache \
  --pull \
  -f Dockerfile.aarch64 \
  -t lscr.io/linuxserver/paperless-ng:latest .

Don't forget the -f Dockerfile.aarch64

And it works flawlessly.

I just tried this suggestion.. I deleted all of my existing docker images on my rpi4b after switching to the 64-bit kernel and all that junk..

When I run the above, it appears to go OK thru all 14 steps and leaves me with two docker images:

pi@JMADS01:~/paperless-ng/docker-paperless-ng $ docker images
REPOSITORY                             TAG             IMAGE ID       CREATED         SIZE
<none>                                 <none>          d788193e8fe9   2 minutes ago   128MB
ghcr.io/linuxserver/baseimage-ubuntu   arm64v8-focal   ff377f4ead39   19 hours ago    128MB

I'm sure I'm missing something -- I don't regularly use docker and I'll admit it's been a while since I've used it. Maybe I'm just a bit dense today.. I might just ditch docker and go the bare-metal route. sorry for being OT;

ps23Rick commented 2 years ago

Ok.. Just a quick followup.. Not sure if anyone else tried this.. I had read recently that the 64-bit kernel and OS was now out of beta for Raspberry Pi's.. So I backed up the bits on my Pi that needed backing up and installed Ubuntu 21 Server (fully 64-bit) and all is well for me now.. I was running a 64-bit kernel with 32-bit userland mostly for the past few weeks but it didn't help.. YMMV!

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

HydrelioxGitHub commented 2 years ago

I think this issue should still opened until a fix is implemented in the docker image.

tobbenb commented 2 years ago

Roxedus asked for people to test a fix, but no-one answered, so the change didn't get merged.

AnomalieXB-6783746 commented 2 years ago

Hey, I'm facing an issue that I suspect to be at least related to this one if not the same.

My Setup:

The corresponding Log shows:

[2022-04-03 12:05:39,603] [INFO] [paperless.consumer] Consuming some.pdf
paperless     | 12:05:40 [Q] INFO Process-1:7 stopped doing work
paperless     | 12:05:40 [Q] ERROR Failed [some.pdf] - libxslt.so.1: cannot open shared object file: No such file or directory : Traceback (most recent call last):
paperless     |   File "/usr/local/lib/python3.8/dist-packages/django_q/cluster.py", line 432, in worker
paperless     |     res = f(*task["args"], **task["kwargs"])
paperless     |   File "/app/paperless/src/documents/tasks.py", line 70, in consume_file
paperless     |     document = Consumer().try_consume_file(
paperless     |   File "/app/paperless/src/documents/consumer.py", line 245, in try_consume_file
paperless     |     document_parser.parse(self.path, mime_type, self.filename)
paperless     |   File "/app/paperless/src/paperless_tesseract/parsers.py", line 237, in parse
paperless     |     import ocrmypdf
paperless     |   File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/__init__.py", line 10, in <module>
paperless     |     from ocrmypdf import helpers, hocrtransform, pdfa, pdfinfo
paperless     |   File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/helpers.py", line 23, in <module>
paperless     |     import pikepdf
paperless     |   File "/usr/local/lib/python3.8/dist-packages/pikepdf/__init__.py", line 55, in <module>
paperless     |     from .models import (
paperless     |   File "/usr/local/lib/python3.8/dist-packages/pikepdf/models/__init__.py", line 20, in <module>
paperless     |     from .metadata import PdfMetadata
paperless     |   File "/usr/local/lib/python3.8/dist-packages/pikepdf/models/metadata.py", line 29, in <module>
paperless     |     from lxml import etree
paperless     | ImportError: libxslt.so.1: cannot open shared object file: No such file or directory
paperless     | 
paperless     | 12:05:40 [Q] INFO recycled worker Process-1:7
paperless     | 12:05:40 [Q] INFO Process-1:9 ready for work at 732

Now the reason why I suspect this phenomenon to be related is that the import of pikepdf is part of the fault-path (which can be seen in the given log). I was not able to find out whether a fix for the bug described in this issue was already distributed in newer version which thereby could have caused this different but similar error.

I tried resolving the problem the way @kurosch suggested but without any success.

@vemek @darkmattercoder

Could you try the following:

apt-get install libxml2-dev libxslt-dev python3-dev libjpeg-dev zlib1g-dev build-essential cython3 -y
python3 -m pip install wheel
python3 -m pip install lxml --force-reinstall
python3 -m pip install Pillow --force-reinstall
python3 -m pip install pikepdf --force-reinstall

I was getting similar issues on my raspberry pi 3. The above steps fixed it for me.

I did try as well to build the image locally but this did not work either due to other errors (not sure if relevant to this issue).

As I am trying for over a week now and I don't seem to find any solution, has anybody an Idea what else I could try to get the whole thing working?

Roxedus commented 2 years ago

Why does this issue keep getting comments, while noone tests pr?

AnomalieXB-6783746 commented 2 years ago

Sorry that I missed to address you @Roxedus , at least in my fault scenario your approach seems not to solve my problem, I keep getting the following error:

paperless     | 13:20:27 [Q] INFO Process-1:7 stopped doing work
paperless     | 13:20:27 [Q] ERROR Failed [some.pdf] - libxslt.so.1: cannot open shared object file: No such file or directory : Traceback (most recent call last):
paperless     |   File "/usr/local/lib/python3.8/dist-packages/django_q/cluster.py", line 432, in worker
paperless     |     res = f(*task["args"], **task["kwargs"])
paperless     |   File "/app/paperless/src/documents/tasks.py", line 74, in consume_file
paperless     |     document = Consumer().try_consume_file(
paperless     |   File "/app/paperless/src/documents/consumer.py", line 248, in try_consume_file
paperless     |     document_parser.parse(self.path, mime_type, self.filename)
paperless     |   File "/app/paperless/src/paperless_tesseract/parsers.py", line 230, in parse
paperless     |     import ocrmypdf
paperless     |   File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/__init__.py", line 10, in <module>
paperless     |     from ocrmypdf import helpers, hocrtransform, leptonica, pdfa, pdfinfo
paperless     |   File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/helpers.py", line 22, in <module>
paperless     |     import pikepdf
paperless     |   File "/usr/local/lib/python3.8/dist-packages/pikepdf/__init__.py", line 50, in <module>
paperless     |     from .models import (
paperless     |   File "/usr/local/lib/python3.8/dist-packages/pikepdf/models/__init__.py", line 14, in <module>
paperless     |     from .metadata import PdfMetadata
paperless     |   File "/usr/local/lib/python3.8/dist-packages/pikepdf/models/metadata.py", line 28, in <module>
paperless     |     from lxml import etree
paperless     | ImportError: libxslt.so.1: cannot open shared object file: No such file or directory
paperless     | 
paperless     | 13:20:28 [Q] INFO recycled worker Process-1:7
paperless     | 13:20:28 [Q] INFO Process-1:9 ready for work at 756

Although as I am not 100% sure if the error occurring to me is the same as to the others so it might be that it solves their problem.

tobbenb commented 2 years ago

Can you try to exec into the container and do an apk add libxslt and try again?

AnomalieXB-6783746 commented 2 years ago

@tobbenb apk seems not to exist in the container and I somewhat failed to install it. Instead I used apt and tried to apt-get install libxslt-dev (libxslt alone seemed not to exist). Yet the process failed with:

Processing triggers for libc-bin (2.31-0ubuntu9.2) ...
/usr/sbin/ldconfig: 16: exec: /sbin/ldconfig.real: not found
/usr/sbin/ldconfig: 16: exec: /sbin/ldconfig.real: not found
dpkg: error processing package libc-bin (--configure):
 installed libc-bin package post-installation script subprocess returned error exit status 127
Errors were encountered while processing:
 libc-bin
E: Sub-process /usr/bin/dpkg returned an error code (1)
tobbenb commented 2 years ago

Sorry, i don't know why I thought it was using alpine as the base OS. Try first apt update then apt install libxslt1.1

AnomalieXB-6783746 commented 2 years ago

Quick and happy update! Thanks to @tobbenb it now seems to work seamlessly. Although I would note that after the apt update the installation of libxslt1.1 was not successful until an apt upgrade. In case this helps anyone in the future.

I'm going to try to figure out whether its just the re-installation of libxslt1.1 that fixes the problem or if it is the combination with the image provided by @Roxedus .

I managed to reproduce this. Can you test lsiodev/paperless-ng:1.5.0-pikepdf to confirm?

When I got any new information on this topic I'll attach it here! In the mean time thanks for the support guys :)

AnomalieXB-6783746 commented 2 years ago

So after a few more tests it seems to work only on some files, on others not. I tried to find a common property on the files working and the files not working but I didn't find any correlation.

The error now occurring on some files is:

[2022-04-03 16:00:34,313] [INFO] [paperless.consumer] Consuming some.pdf
[2022-04-03 16:00:37,179] [ERROR] [paperless.consumer] Error while consuming document some.pdf: ImportError: cannot import name '_imagingcms' from 'PIL' (/usr/local/lib/python3.8/dist-packages/PIL/__init__.py)
Traceback (most recent call last):
  File "/app/paperless/src/paperless_tesseract/parsers.py", line 241, in parse
    ocrmypdf.ocr(**args)
  File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/api.py", line 340, in ocr
    return run_pipeline(options=options, plugin_manager=plugin_manager, api=True)
  File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/_sync.py", line 359, in run_pipeline
    pdfinfo = get_pdfinfo(
  File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/_pipeline.py", line 157, in get_pdfinfo
    return PdfInfo(
  File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/pdfinfo/info.py", line 860, in __init__
    self._pages = _pdf_pageinfo_concurrent(
  File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/pdfinfo/info.py", line 644, in _pdf_pageinfo_concurrent
    executor(
  File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/_concurrent.py", line 82, in __call__
    self._execute(
  File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/builtin_plugins/concurrency.py", line 132, in _execute
    for result in results:
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 868, in next
    raise value
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/pdfinfo/info.py", line 601, in _pdf_pageinfo_sync
    page = PageInfo(pdf, pageno, infile, check_pages, detailed_analysis)
  File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/pdfinfo/info.py", line 675, in __init__
    self._gather_pageinfo(pdf, pageno, infile, check_pages, detailed_analysis)
  File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/pdfinfo/info.py", line 721, in _gather_pageinfo
    for ci in _process_content_streams(
  File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/pdfinfo/info.py", line 528, in _process_content_streams
    yield from _find_regular_images(container, contentsinfo)
  File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/pdfinfo/info.py", line 446, in _find_regular_images
    yield ImageInfo(name=draw.name, pdfimage=pdfimage, shorthand=draw.shorthand)
  File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/pdfinfo/info.py", line 319, in __init__
    pim_icc = pim.icc
  File "/usr/local/lib/python3.8/dist-packages/pikepdf/models/image.py", line 394, in icc
    self._icc = ImageCms.ImageCmsProfile(iccbytesio)
  File "/usr/local/lib/python3.8/dist-packages/PIL/ImageCms.py", line 172, in __init__
    self._set(core.profile_frombytes(profile.read()))
  File "/usr/local/lib/python3.8/dist-packages/PIL/_util.py", line 19, in __getattr__
    raise self.ex
  File "/usr/local/lib/python3.8/dist-packages/PIL/ImageCms.py", line 23, in <module>
    from PIL import _imagingcms
ImportError: cannot import name '_imagingcms' from 'PIL' (/usr/local/lib/python3.8/dist-packages/PIL/__init__.py)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/app/paperless/src/documents/consumer.py", line 248, in try_consume_file
    document_parser.parse(self.path, mime_type, self.filename)
  File "/app/paperless/src/paperless_tesseract/parsers.py", line 290, in parse
    raise ParseError(f"{e.__class__.__name__}: {str(e)}")
documents.parsers.ParseError: ImportError: cannot import name '_imagingcms' from 'PIL' (/usr/local/lib/python3.8/dist-packages/PIL/__init__.py)
16:00:37 [Q] INFO Process-1:28 stopped doing work
16:00:37 [Q] ERROR Failed [some.pdf] - some.pdf: Error while consuming document some.pdf: ImportError: cannot import name '_imagingcms' from 'PIL' (/usr/local/lib/python3.8/dist-packages/PIL/__init__.py) : Traceback (most recent call last):
  File "/app/paperless/src/paperless_tesseract/parsers.py", line 241, in parse
    ocrmypdf.ocr(**args)
  File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/api.py", line 340, in ocr
    return run_pipeline(options=options, plugin_manager=plugin_manager, api=True)
  File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/_sync.py", line 359, in run_pipeline
    pdfinfo = get_pdfinfo(
  File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/_pipeline.py", line 157, in get_pdfinfo
    return PdfInfo(
  File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/pdfinfo/info.py", line 860, in __init__
    self._pages = _pdf_pageinfo_concurrent(
  File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/pdfinfo/info.py", line 644, in _pdf_pageinfo_concurrent
    executor(
  File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/_concurrent.py", line 82, in __call__
    self._execute(
  File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/builtin_plugins/concurrency.py", line 132, in _execute
    for result in results:
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 868, in next
    raise value
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/pdfinfo/info.py", line 601, in _pdf_pageinfo_sync
    page = PageInfo(pdf, pageno, infile, check_pages, detailed_analysis)
  File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/pdfinfo/info.py", line 675, in __init__
    self._gather_pageinfo(pdf, pageno, infile, check_pages, detailed_analysis)
  File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/pdfinfo/info.py", line 721, in _gather_pageinfo
    for ci in _process_content_streams(
  File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/pdfinfo/info.py", line 528, in _process_content_streams
    yield from _find_regular_images(container, contentsinfo)
  File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/pdfinfo/info.py", line 446, in _find_regular_images
    yield ImageInfo(name=draw.name, pdfimage=pdfimage, shorthand=draw.shorthand)
  File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/pdfinfo/info.py", line 319, in __init__
    pim_icc = pim.icc
  File "/usr/local/lib/python3.8/dist-packages/pikepdf/models/image.py", line 394, in icc
    self._icc = ImageCms.ImageCmsProfile(iccbytesio)
  File "/usr/local/lib/python3.8/dist-packages/PIL/ImageCms.py", line 172, in __init__
    self._set(core.profile_frombytes(profile.read()))
  File "/usr/local/lib/python3.8/dist-packages/PIL/_util.py", line 19, in __getattr__
    raise self.ex
  File "/usr/local/lib/python3.8/dist-packages/PIL/ImageCms.py", line 23, in <module>
    from PIL import _imagingcms
ImportError: cannot import name '_imagingcms' from 'PIL' (/usr/local/lib/python3.8/dist-packages/PIL/__init__.py)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/asgiref/sync.py", line 288, in main_wrap
    raise exc_info[1]
  File "/app/paperless/src/documents/consumer.py", line 248, in try_consume_file
    document_parser.parse(self.path, mime_type, self.filename)
  File "/app/paperless/src/paperless_tesseract/parsers.py", line 290, in parse
    raise ParseError(f"{e.__class__.__name__}: {str(e)}")
documents.parsers.ParseError: ImportError: cannot import name '_imagingcms' from 'PIL' (/usr/local/lib/python3.8/dist-packages/PIL/__init__.py)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/django_q/cluster.py", line 432, in worker
    res = f(*task["args"], **task["kwargs"])
  File "/app/paperless/src/documents/tasks.py", line 74, in consume_file
    document = Consumer().try_consume_file(
  File "/app/paperless/src/documents/consumer.py", line 266, in try_consume_file
    self._fail(
  File "/app/paperless/src/documents/consumer.py", line 70, in _fail
    raise ConsumerError(f"{self.filename}: {log_message or message}")
documents.consumer.ConsumerError: some.pdf: Error while consuming document some.pdf: ImportError: cannot import name '_imagingcms' from 'PIL' (/usr/local/lib/python3.8/dist-packages/PIL/__init__.py)

16:00:37 [Q] INFO recycled worker Process-1:28
16:00:37 [Q] INFO Process-1:30 ready for work at 3007

Besides this probably being a different issue, i find it somewhat interesting other exceptions ocuring during exceptionhandling. I guess that should not be.

kurosch commented 2 years ago

@Roxedus I am tested it and it doesn't work, still. The error is identical to @AnomalieXB-6783746.

AnomalieXB-6783746 commented 2 years ago

After some more research it seems like this could be a bug in papaerless-ng itself Issue.

nothing2obvi commented 2 years ago

Still having this issue. I opened a dupe issue, which was closed.

darkmattercoder commented 2 years ago

Consider GitHub.com/Linux-Server/docker-paperless-ngx as alternative. I think the maintenance effort for Ng is unnecessary because ngx has been established

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

badermaiss commented 2 years ago

Same issue here with the latest image on docker running with portainer. Here's the command docker pull linuxserver/paperless-ng to confirm.

Using default tag: latest latest: Pulling from linuxserver/paperless-ng Digest: sha256:c5d2a1006be929edb9098532e0fee68f9588ff265e651fb7d1ab78c9ac350426 Status: Image is up to date for linuxserver/paperless-ng:latest docker.io/linuxserver/paperless-ng:latest

@jemelo's set of commands worked for me after accessing the container with docker exec -it paperless su

sudo apt-get install libxml2-dev libxslt-dev python-dev sudo apt-get install libjpeg-dev zlib1g-dev pip install wheel apt install python3-dev build-essential -y && pip install pikepdf==2.16.1 --force-reinstall

Running on a Raspberry Pi 4B 4GB RAM with Raspberry Pi OS (64-bit).

smignon612 commented 2 years ago

Same issue here with the latest image on docker running with portainer. Here's the command docker pull linuxserver/paperless-ng to confirm.

Using default tag: latest latest: Pulling from linuxserver/paperless-ng Digest: sha256:c5d2a1006be929edb9098532e0fee68f9588ff265e651fb7d1ab78c9ac350426 Status: Image is up to date for linuxserver/paperless-ng:latest docker.io/linuxserver/paperless-ng:latest

@jemelo's set of commands worked for me after accessing the container with docker exec -it paperless su

sudo apt-get install libxml2-dev libxslt-dev python-dev sudo apt-get install libjpeg-dev zlib1g-dev pip install wheel apt install python3-dev build-essential -y && pip install pikepdf==2.16.1 --force-reinstall

Running on a Raspberry Pi 4B 4GB RAM with Raspberry Pi OS (64-bit).

works for me, thx