jonaswinkler / paperless-ng

A supercharged version of paperless: scan, index and archive all your physical documents
https://paperless-ng.readthedocs.io/en/latest/
GNU General Public License v3.0
5.37k stars 355 forks source link

Error parsing documents #61

Closed Daedren closed 3 years ago

Daedren commented 3 years ago

I'm having issues having the consumer add files. Using docker, Paperless-ng 0.9.3

Seems to happen on all types of documents, from PDFs with text to be read, to photos that require OCR.

paperlessng-web       | 17:15:03 [Q] INFO Enqueued 1
paperlessng-web       | 17:15:03 [Q] INFO Process-1:1 processing [Scan 29 Oct 2020 at 18.50.pdf]
paperlessng-web       | Consuming Scan 29 Oct 2020 at 18.50.pdf
paperlessng-web       | Parser: RasterisedDocumentParser based on mime type application/pdf
paperlessng-web       | Generating thumbnail for Scan 29 Oct 2020 at 18.50.pdf...
paperlessng-web       | Execute: convert -density 300 -scale 500x5000> -alpha remove -strip -trim /usr/src/paperless/src/../consume/Scan 29 Oct 2020 at 18.50.pdf[0] /tmp/paperless/paperless-x57ze_1i/convert.png
paperlessng-web       | Execute: optipng -silent -o5 /tmp/paperless/paperless-x57ze_1i/convert.png -out /tmp/paperless/paperless-x57ze_1i/optipng.png
paperlessng-web       | Parsing Scan 29 Oct 2020 at 18.50.pdf...
paperlessng-web       | Converting document /usr/src/paperless/src/../consume/Scan 29 Oct 2020 at 18.50.pdf into greyscale images
paperlessng-web       | Execute: convert -density 300 -type grayscale -depth 8 /usr/src/paperless/src/../consume/Scan 29 Oct 2020 at 18.50.pdf /tmp/paperless/paperless-x57ze_1i/convert-%04d.pnm
paperlessng-web       | 127.0.0.1 - - [28/Nov/2020:17:15:12 +0000] "GET / HTTP/1.1" 302 0 "-" "curl/7.64.0"
paperlessng-web       | Running unpaper on 1 pages...
paperlessng-web       | Execute: unpaper --overwrite --quiet /tmp/paperless/paperless-x57ze_1i/convert-0000.pnm /tmp/paperless/paperless-x57ze_1i/convert-0000.unpaper.pnm
paperlessng-web       | Attempting language detection on page 1 of 1...
paperlessng-web       | Performing OCR on 1 page(s) with language por
paperlessng-web       | Detected language: pt (default language)
paperlessng-web       | OCR completed.
paperlessng-web       | Unable to detect date for document
paperlessng-web       | Saving record to database
paperlessng-web       | Assigning correspondent OCIDENTAL to 20201029185017: Scan 29 Oct 2020 at 18.50
paperlessng-web       | Indexing 20201029185017: OCIDENTAL - Scan 29 Oct 2020 at 18.50...
paperlessng-web       | Deleting directory /tmp/paperless/paperless-x57ze_1i
paperlessng-web       | 17:15:26 [Q] ERROR Failed [Scan 29 Oct 2020 at 18.50.pdf] - expected str, bytes or os.PathLike object, not NoneType : Traceback (most recent call last):
paperlessng-web       |   File "/usr/src/paperless/src/documents/consumer.py", line 160, in try_consume_file
paperlessng-web       |     classifier=classifier
paperlessng-web       |   File "/usr/local/lib/python3.7/site-packages/django/dispatch/dispatcher.py", line 179, in send
paperlessng-web       |     for receiver in self._live_receivers(sender)
paperlessng-web       |   File "/usr/local/lib/python3.7/site-packages/django/dispatch/dispatcher.py", line 179, in <listcomp>
paperlessng-web       |     for receiver in self._live_receivers(sender)
paperlessng-web       |   File "/usr/src/paperless/src/documents/signals/handlers.py", line 165, in run_post_consume_script
paperlessng-web       |     str(",".join(document.tags.all().values_list("slug", flat=True)))
paperlessng-web       |   File "/usr/local/lib/python3.7/subprocess.py", line 800, in __init__
paperlessng-web       |     restore_signals, start_new_session)
paperlessng-web       |   File "/usr/local/lib/python3.7/subprocess.py", line 1482, in _execute_child
paperlessng-web       |     restore_signals, start_new_session, preexec_fn)
paperlessng-web       | TypeError: expected str, bytes or os.PathLike object, not NoneType
paperlessng-web       |
paperlessng-web       | During handling of the above exception, another exception occurred:
paperlessng-web       |
paperlessng-web       | Traceback (most recent call last):
paperlessng-web       |   File "/usr/local/lib/python3.7/site-packages/django_q/cluster.py", line 436, in worker
paperlessng-web       |     res = f(*task["args"], **task["kwargs"])
paperlessng-web       |   File "/usr/src/paperless/src/documents/tasks.py", line 69, in consume_file
paperlessng-web       |     override_tag_ids=override_tag_ids)
paperlessng-web       |   File "/usr/src/paperless/src/documents/consumer.py", line 174, in try_consume_file
paperlessng-web       |     raise ConsumerError(e)
paperlessng-web       | documents.consumer.ConsumerError: expected str, bytes or os.PathLike object, not NoneType

Logs from the UI

 11/28/20, 5:00 PM INFO Saving updated classifier model to /usr/src/paperless/src/../data/classification_model.pickle...

11/28/20, 5:00 PM WARNING Cannot classify documents: [Errno 2] No such file or directory: '/usr/src/paperless/src/../data/classification_model.pickle'

11/28/20, 5:15 PM INFO Unable to detect date for document

11/28/20, 5:15 PM INFO Consuming Scan 29 Oct 2020 at 18.50.pdf
jonaswinkler commented 3 years ago

Works without configuring a post-consume script. I've got the URLs to the new document not figured out yet, some API has changed.