jgillula / paperless-ngx-postprocessor

A powerful and customizable postprocessing script for paperless-ngx
GNU Affero General Public License v3.0
97 stars 10 forks source link

JSONDecodeError When Retrieving Document Information #13

Open blackbunt opened 1 year ago

blackbunt commented 1 year ago

Encountered a JSONDecodeError while running the post_consume_script.sh script. The script seems to be having trouble decoding an expected JSON response from the Paperless API.

Steps to Reproduce:

  1. Execute the post_consume_script.sh script.

  2. Monitor the log outputs.

Expected Behavior: The script should run without errors and be able to correctly fetch document information.

Actual Behavior: The script throws a JSONDecodeError and exits.

Log Outputs:

[2023-08-30 10:47:44,441] [INFO] [paperless.consumer] Executing post-consume script /usr/src/paperless-ngx-postprocessor/post_consume_script.sh

[2023-08-30 10:47:44,866] [INFO] [paperless.consumer] /usr/src/paperless-ngx-postprocessor/post_consume_script.sh exited 0

[2023-08-30 10:47:44,867] [WARNING] [paperless.consumer] Script stderr:

[2023-08-30 10:47:44,867] [WARNING] [paperless.consumer] Traceback (most recent call last):

[2023-08-30 10:47:44,867] [WARNING] [paperless.consumer]   File "/usr/src/paperless-ngx-postprocessor/venv/lib/python3.9/site-packages/requests/models.py", line 971, in json

[2023-08-30 10:47:44,867] [WARNING] [paperless.consumer]     return complexjson.loads(self.text, **kwargs)

[2023-08-30 10:47:44,867] [WARNING] [paperless.consumer]   File "/usr/local/lib/python3.9/json/__init__.py", line 346, in loads

[2023-08-30 10:47:44,867] [WARNING] [paperless.consumer]     return _default_decoder.decode(s)

[2023-08-30 10:47:44,868] [WARNING] [paperless.consumer]   File "/usr/local/lib/python3.9/json/decoder.py", line 337, in decode

[2023-08-30 10:47:44,868] [WARNING] [paperless.consumer]     obj, end = self.raw_decode(s, idx=_w(s, 0).end())

[2023-08-30 10:47:44,868] [WARNING] [paperless.consumer]   File "/usr/local/lib/python3.9/json/decoder.py", line 355, in raw_decode

[2023-08-30 10:47:44,868] [WARNING] [paperless.consumer]     raise JSONDecodeError("Expecting value", s, err.value) from None

[2023-08-30 10:47:44,868] [WARNING] [paperless.consumer] json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

[2023-08-30 10:47:44,868] [WARNING] [paperless.consumer]

[2023-08-30 10:47:44,869] [WARNING] [paperless.consumer] During handling of the above exception, another exception occurred:

[2023-08-30 10:47:44,869] [WARNING] [paperless.consumer]

[2023-08-30 10:47:44,869] [WARNING] [paperless.consumer] Traceback (most recent call last):

[2023-08-30 10:47:44,869] [WARNING] [paperless.consumer]   File "/usr/src/paperless-ngx-postprocessor/paperlessngx_postprocessor.py", line 101, in <module>

[2023-08-30 10:47:44,869] [WARNING] [paperless.consumer]     documents.append(api.get_document_by_id(selector_config.get("document_id")))

[2023-08-30 10:47:44,869] [WARNING] [paperless.consumer]   File "/usr/src/paperless-ngx-postprocessor/paperlessngx_postprocessor/paperless_api.py", line 150, in get_document_by_id

[2023-08-30 10:47:44,869] [WARNING] [paperless.consumer]     return self._get_item_by_id("documents", document_id)

[2023-08-30 10:47:44,870] [WARNING] [paperless.consumer]   File "/usr/src/paperless-ngx-postprocessor/paperlessngx_postprocessor/paperless_api.py", line 49, in _get_item_by_id

[2023-08-30 10:47:44,870] [WARNING] [paperless.consumer]     return response.json()

[2023-08-30 10:47:44,870] [WARNING] [paperless.consumer]   File "/usr/src/paperless-ngx-postprocessor/venv/lib/python3.9/site-packages/requests/models.py", line 975, in json

[2023-08-30 10:47:44,870] [WARNING] [paperless.consumer]     raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)

[2023-08-30 10:47:44,870] [WARNING] [paperless.consumer] requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
jgillula commented 1 year ago

Can you share what version of paperless-ngx and paperless-ngx-postprocessor you're using?

Could you also set PNGX_POSTPROCESSOR_VERBOSE=DEBUG in your docker-compose.env for paperless-ngx, re-run docker-compose up -d, try again, and then share the new log output?

Thanks!

blackbunt commented 1 year ago

Paperless Version 1.17.3

[2023-08-30 15:46:12,341] [INFO] [paperless.consumer] Consuming document.pdf

[2023-08-30 15:46:12,343] [DEBUG] [paperless.consumer] Detected mime type: application/pdf

[2023-08-30 15:46:12,347] [DEBUG] [paperless.consumer] Parser: RasterisedDocumentParser

[2023-08-30 15:46:12,350] [DEBUG] [paperless.consumer] Parsing document.pdf...

[2023-08-30 15:46:12,535] [DEBUG] [paperless.parsing.tesseract] Calling OCRmyPDF with args: {'input_file': PosixPath('/tmp/paperless/paperless-ngx4xirhu7n/document.pdf'), 'output_file': PosixPath('/tmp/paperless/paperless-dp1f7n41/archive.pdf'), 'use_threads': True, 'jobs': '2', 'language': 'deu', 'output_type': 'pdfa', 'progress_bar': False, 'redo_ocr': True, 'clean': True, 'rotate_pages': True, 'rotate_pages_threshold': 12.0, 'sidecar': PosixPath('/tmp/paperless/paperless-dp1f7n41/sidecar.txt')}

[2023-08-30 15:46:12,928] [WARNING] [paperless.parsing.tesseract] This file is encrypted, OCR is impossible. Using any text present in the original file.

[2023-08-30 15:46:12,929] [DEBUG] [paperless.consumer] Generating thumbnail for document.pdf...

[2023-08-30 15:46:12,932] [DEBUG] [paperless.parsing] Execute: convert -density 300 -scale 500x5000> -alpha remove -strip -auto-orient /tmp/paperless/paperless-ngx4xirhu7n/document.pdf[0] /tmp/paperless/paperless-dp1f7n41/convert.webp

[2023-08-30 15:46:14,749] [DEBUG] [paperless.consumer] Saving record to database

[2023-08-30 15:46:14,749] [DEBUG] [paperless.consumer] Creation date from parse_date: 2023-07-31 00:00:00+00:00

[2023-08-30 15:46:15,023] [INFO] [paperless.handlers] Assigning correspondent Bank to 2023-07-31 document

[2023-08-30 15:46:15,040] [INFO] [paperless.handlers] Assigning document type Kontoauszug to 2023-07-31 Bank document

[2023-08-30 15:46:15,057] [DEBUG] [paperless.matching] Tag Name matched on document 2023-07-31Bank
document because the string Name matches the regular expression bernhard

[2023-08-30 15:46:15,059] [INFO] [paperless.handlers] Tagging "2023-07-31 Bank document" with "Name"

[2023-08-30 15:46:15,075] [INFO] [paperless.handlers] Assigning storage path Bank to 2023-07-31 Bank document

[2023-08-30 15:46:16,336] [DEBUG] [paperless.filehandling] Document has storage_path 1 (/Bank/{correspondent}/{document_type}/{created_year}/{asn}-{correspondent}-{document_type}-{title}) set

[2023-08-30 15:46:16,341] [DEBUG] [paperless.filehandling] Document has storage_path 1 (/Bank/{correspondent}/{document_type}/{created_year}/{asn}-{correspondent}-{document_type}-{title}) set

[2023-08-30 15:46:16,344] [DEBUG] [paperless.consumer] Deleting file /tmp/paperless/paperless-ngx4xirhu7n/document.pdf

[2023-08-30 15:46:16,352] [DEBUG] [paperless.parsing.tesseract] Deleting directory /tmp/paperless/paperless-dp1f7n41
[2023-08-30 15:46:16,353] [INFO] [paperless.consumer] Executing post-consume script /usr/src/paperless-ngx-postprocessor/post_consume_script.sh

[2023-08-30 15:46:16,793] [INFO] [paperless.consumer] /usr/src/paperless-ngx-postprocessor/post_consume_script.sh exited 0

[2023-08-30 15:46:16,794] [WARNING] [paperless.consumer] Script stderr:

[2023-08-30 15:46:16,794] [WARNING] [paperless.consumer] [2023-08-30 15:46:16,725] [DEBUG] [paperlessngx_postprocessor] Running /usr/src/paperless-ngx-postprocessor/paperlessngx_postprocessor.py with config {'auth_token': 'auth-token-removed-for-github', 'dry_run': False, 'skip_validation': False, 'backup': None, 'postprocessing_tag': None, 'invalid_tag': None, 'verbose': 'DEBUG', 'rulesets_dir': '/usr/src/paperless-ngx-postprocessor/rulesets.d', 'paperless_api_url': 'http://xxx.xxx.xxx.xxx:8000', 'paperless_src_dir': '/usr/src/paperless/src', 'mode': 'process', 'filename': None} and {'document_id': '1910', 'correspondent': None, 'document_type': None, 'tag': None, 'storage_path': None, 'created_year': None, 'created_month': None, 'created_day': None, 'created_range': None, 'added_month': None, 'added_day': None, 'added_range': None, 'asn': None, 'title': None, 'all': False}

[2023-08-30 15:46:16,794] [WARNING] [paperless.consumer] [2023-08-30 15:46:16,728] [DEBUG] [postprocessor] Loaded 3 rules

[2023-08-30 15:46:16,794] [WARNING] [paperless.consumer] Traceback (most recent call last):

[2023-08-30 15:46:16,795] [WARNING] [paperless.consumer]   File "/usr/src/paperless-ngx-postprocessor/venv/lib/python3.9/site-packages/requests/models.py", line 971, in json

[2023-08-30 15:46:16,795] [WARNING] [paperless.consumer]     return complexjson.loads(self.text, **kwargs)

[2023-08-30 15:46:16,795] [WARNING] [paperless.consumer]   File "/usr/local/lib/python3.9/json/__init__.py", line 346, in loads

[2023-08-30 15:46:16,795] [WARNING] [paperless.consumer]     return _default_decoder.decode(s)

[2023-08-30 15:46:16,795] [WARNING] [paperless.consumer]   File "/usr/local/lib/python3.9/json/decoder.py", line 337, in decode

[2023-08-30 15:46:16,795] [WARNING] [paperless.consumer]     obj, end = self.raw_decode(s, idx=_w(s, 0).end())

[2023-08-30 15:46:16,796] [WARNING] [paperless.consumer]   File "/usr/local/lib/python3.9/json/decoder.py", line 355, in raw_decode

[2023-08-30 15:46:16,796] [WARNING] [paperless.consumer]     raise JSONDecodeError("Expecting value", s, err.value) from None

[2023-08-30 15:46:16,796] [WARNING] [paperless.consumer] json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

[2023-08-30 15:46:16,796] [WARNING] [paperless.consumer]

[2023-08-30 15:46:16,796] [WARNING] [paperless.consumer] During handling of the above exception, another exception occurred:

[2023-08-30 15:46:16,796] [WARNING] [paperless.consumer]

[2023-08-30 15:46:16,796] [WARNING] [paperless.consumer] Traceback (most recent call last):

[2023-08-30 15:46:16,797] [WARNING] [paperless.consumer]   File "/usr/src/paperless-ngx-postprocessor/paperlessngx_postprocessor.py", line 101, in <module>

[2023-08-30 15:46:16,797] [WARNING] [paperless.consumer]     documents.append(api.get_document_by_id(selector_config.get("document_id")))

[2023-08-30 15:46:16,797] [WARNING] [paperless.consumer]   File "/usr/src/paperless-ngx-postprocessor/paperlessngx_postprocessor/paperless_api.py", line 150, in get_document_by_id

[2023-08-30 15:46:16,797] [WARNING] [paperless.consumer]     return self._get_item_by_id("documents", document_id)

[2023-08-30 15:46:16,797] [WARNING] [paperless.consumer]   File "/usr/src/paperless-ngx-postprocessor/paperlessngx_postprocessor/paperless_api.py", line 49, in _get_item_by_id

[2023-08-30 15:46:16,797] [WARNING] [paperless.consumer]     return response.json()

[2023-08-30 15:46:16,798] [WARNING] [paperless.consumer]   File "/usr/src/paperless-ngx-postprocessor/venv/lib/python3.9/site-packages/requests/models.py", line 975, in json

[2023-08-30 15:46:16,798] [WARNING] [paperless.consumer]     raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)

[2023-08-30 15:46:16,798] [WARNING] [paperless.consumer] requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

i removed any personal details, the weird thing is that everything is None altough everything is properly reckognized by paperless

jgillula commented 1 year ago

It's not surprising that this line:

[2023-08-30 15:46:16,794] [WARNING] [paperless.consumer] [2023-08-30 15:46:16,725] [DEBUG] [paperlessngx_postprocessor] Running /usr/src/paperless-ngx-postprocessor/paperlessngx_postprocessor.py with config {'auth_token': 'auth-token-removed-for-github', 'dry_run': False, 'skip_validation': False, 'backup': None, 'postprocessing_tag': None, 'invalid_tag': None, 'verbose': 'DEBUG', 'rulesets_dir': '/usr/src/paperless-ngx-postprocessor/rulesets.d', 'paperless_api_url': 'http://xxx.xxx.xxx.xxx:8000', 'paperless_src_dir': '/usr/src/paperless/src', 'mode': 'process', 'filename': None} and {'document_id': '1910', 'correspondent': None, 'document_type': None, 'tag': None, 'storage_path': None, 'created_year': None, 'created_month': None, 'created_day': None, 'created_range': None, 'added_month': None, 'added_day': None, 'added_range': None, 'asn': None, 'title': None, 'all': False}

has a lot of None in it; most of that is command-line arguments, which wouldn't normally be used when running paperless-ngx-postprocessor as a post-consume script.

Could you checkout the debug branch and try again? I made it so it should print out the full response text, so we can see what Paperless is returning and figure out why it's not being parsed as JSON.

All we should need is be the line that has the text "[paperless_api:49] Response:" in it. If things are working that will include the document contents so you'll probably want to redact that, but I have a hunch we're going to see some sort of error text returned...