jgillula / paperless-ngx-postprocessor

A powerful and customizable postprocessing script for paperless-ngx
GNU Affero General Public License v3.0
97 stars 10 forks source link

Documentation for bare metal-installation #21

Open peterpan192 opened 5 months ago

peterpan192 commented 5 months ago

Hey, I'm struggling with the installation on my bare metal-installation on raspberry pi OS aarch64. There is no docker-compose.env, but I am pretty sure the equivalent in my system is "paperless.conf". However, I have not found out what could be the corresponding file to docker-compose.yml and how to get the one time setup script going. Any help is appreciated.

jgillula commented 5 months ago

I think for bare metal installation you can ignore the docker-compose.yml step. All it does is tell docker to allow paperless-ngx to be able to access the folder where the postprocessor script lives in the docker host, and you shouldn't need to do that in a bare metal installation. Just make sure that in paperless.conf, the variable PAPERLESS_POST_CONSUME_SCRIPT points to the [post_consume_script.sh](https://github.com/jgillula/paperless-ngx-postprocessor/blob/main/post_consume_script.sh) file in the postprocessor git repo, and that the paperless user can read that directory (and execute the post-consume script).

For the one time setup script (setup_venv.sh), you can just run it as the paperless user in the directory where you checked out the postprocessor, i.e. something like:

cd /whichever/directory/you-checked-the/paperless-ngx-postprocessor/repo/out/
sudo -Hu paperless setup_venv.sh

Let me know if that works.

peterpan192 commented 5 months ago

I think it kind of worked (at least I could run the sh-script) but when performing a dry-run with

sudo -Hu paperless /bin/bash -c 'source venv/bin/activate && ./paperlessngx_postprocessor.py --dry-run' ,

I get this error code:

[2024-06-09 17:51:31,412] [INFO] [paperlessngx_postprocessor] Doing a dry run. No changes will be made. Traceback (most recent call last): File "/opt/paperless/.local/lib/python3.11/site-packages/asgiref/local.py", line 89, in _lock_storage asyncio.get_running_loop() RuntimeError: no running event loop

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/opt/paperless/.local/lib/python3.11/site-packages/django/utils/connection.py", line 58, in getitem return getattr(self._connections, alias) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/paperless/.local/lib/python3.11/site-packages/asgiref/local.py", line 118, in getattr return getattr(storage, key) ^^^^^^^^^^^^^^^^^^^^^ AttributeError: '_thread._local' object has no attribute 'default'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/opt/paperless/paperless-ngx-postprocessor/paperlessngx_postprocessor/get_auth_token.py", line 14, in get_auth_token cursor = connection.cursor() ^^^^^^^^^^^^^^^^^ File "/opt/paperless/.local/lib/python3.11/site-packages/django/utils/connection.py", line 15, in getattr return getattr(self._connections[self._alias], item)


  File "/opt/paperless/.local/lib/python3.11/site-packages/django/utils/connection.py", line 60, in __getitem__
    if alias not in self.settings:
                    ^^^^^^^^^^^^^
  File "/opt/paperless/.local/lib/python3.11/site-packages/django/utils/functional.py", line 47, in __get__
    res = instance.__dict__[self.name] = self.func(instance)
                                         ^^^^^^^^^^^^^^^^^^^
  File "/opt/paperless/.local/lib/python3.11/site-packages/django/utils/connection.py", line 45, in settings
    self._settings = self.configure_settings(self._settings)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/paperless/.local/lib/python3.11/site-packages/django/db/utils.py", line 148, in configure_settings
    databases = super().configure_settings(databases)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/paperless/.local/lib/python3.11/site-packages/django/utils/connection.py", line 50, in configure_settings
    settings = getattr(django_settings, self.settings_name)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/paperless/.local/lib/python3.11/site-packages/django/conf/__init__.py", line 89, in __getattr__
    self._setup(name)
  File "/opt/paperless/.local/lib/python3.11/site-packages/django/conf/__init__.py", line 76, in _setup
    self._wrapped = Settings(settings_module)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/paperless/.local/lib/python3.11/site-packages/django/conf/__init__.py", line 190, in __init__
    mod = importlib.import_module(self.SETTINGS_MODULE)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1206, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1178, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1128, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1206, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1178, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1142, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'paperless'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/paperless/paperless-ngx-postprocessor/./paperlessngx_postprocessor.py", line 65, in <module>
    api = PaperlessAPI(config["paperless_api_url"],
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/paperless/paperless-ngx-postprocessor/paperlessngx_postprocessor/paperless_api.py", line 22, in __init__
    auth_token = get_auth_token(paperless_src_dir)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/paperless/paperless-ngx-postprocessor/paperlessngx_postprocessor/get_auth_token.py", line 27, in get_auth_token
    raise RuntimeError(f"Couldn't find paperless-ngx's source code in {paperless_src_dir}")
RuntimeError: Couldn't find paperless-ngx's source code in /usr/src/paperless/src
jgillula commented 5 months ago

Very helpful logs, thank you!

That last error message is the key: to interact with paperless-ngx over its REST API, the postprocessor needs an auth token. By default the postprocessor tries to get that auth token by pretending to be paperless itself and extracting it from paperless-ngx's auth database. But to do that, it needs to know where the source code for paperless-ngx is.

There may be two ways to solve this:

  1. Provide the PNGX_POSTPROCESSOR_AUTH_TOKEN=<token> value. Normally this would go in docker-compose.env, but for a base metal install try putting it in paperless.conf. You can get the auth token from paperless-ngx's django admin (e.g. http://localhost:8000/admin/authtoken/tokenproxy/).
  2. Or you can provide the PNGX_POSTPROCESSOR_PAPERLESS_SRC_DIR=<directory> value (also in paperless.conf). I think the value you want is probably /opt/paperless/src, but it will depend on where you put the source code for paperless-ngx.

You can also provide the --auth-token option when doing a dry-run.

peterpan192 commented 5 months ago

Alright, I got one step closer. post_consume_script.sh gets started when I upload a new document to my paperless-instance. In order for the script to run, I had to add user paperless without being prompted for a password to the sudoers-file. Only this way, the python-script will run effectively from the sh-script. I edited post_consume_script.sh like this:

!/usr/bin/env bash

Define the run directory based on the script's location

RUN_DIR=$( dirname -- "$( readlink -f -- "$0"; )" )

Activate the virtual environment and execute the Python script

sudo -Hu paperless /bin/bash -c "source $RUN_DIR/venv/bin/activate && python $RUN_DIR/paperlessngx_postprocessor.py --auth-token xyz --rulesets-dir /opt/paperless/paperless-ngx-postprocessor/rulesets.d --process all"

However, this way the post-processing-script processes all files of my library every time I upload a new document. How can I achieve only the new uploaded document being processed?

jgillula commented 5 months ago

Interesting. So does that mean paperless-ngx wasn't already running the post-consume script as the paperless user? It's weird to me that you had to sudo as paperless in the post-consume script. But if that's what works, that's what works. 🙂

For processing only the new document: when paperless-ngx calls post_consume_script.sh, one of the environment variables paperless-ngx sets should be DOCUMENT_ID, which refers to the ID of the new document. That's how post_consume_script.sh knows which document to process.

I would try changing that last line to:

sudo -HEu paperless /bin/bash -c "source $RUN_DIR/venv/bin/activate && python $RUN_DIR/paperlessngx_postprocessor.py --auth-token xyz --rulesets-dir /opt/paperless/paperless-ngx-postprocessor/rulesets.d process --document-id $DOCUMENT_ID

(Note the added -E being passed to sudo--this tells sudo to preserve the environment variables, which you probably need for $DOCUMENT_ID to get passed through)

peterpan192 commented 5 months ago

I'm not sure. The script was triggered by paperless-ngx so it must have been executed as user paperless. However, nothing happened. Does not make sense to me either but it's working. ;-) Your suggestion with process --document-id $DOCUMENT_ID works flawlessly. Thank you so much!