paperless-ngx / paperless-ngx

A community-supported supercharged version of paperless: scan, index and archive all your physical documents
https://docs.paperless-ngx.com
GNU General Public License v3.0
22.34k stars 1.23k forks source link

[BUG] document_renamer not working on NFS Storage #549

Closed Rabbit234 closed 2 years ago

Rabbit234 commented 2 years ago

Describe the bug Hey,

a couple weeks ago I have upgraded my docker image from paperless-ng to paperless-ngx. (Version 1.6.0) Now I wanted to change my PAPERLESS_FILENAME_FORMAT and read in the documentation that there is a script called document_renamer to rename the already archived files. My paperless volume is stored on an NFS Folder.

  1. I changed my docker-compose.env file and restarted all the containers related to paperless.
  2. I tried to run docker exec paperless_webserver_1 python3 ./manage.py document_renamer

Unfortunately this gives me the following error:

SystemCheckError: System check identified some issues:

ERRORS:
?: PAPERLESS_MEDIA_ROOT is not writeable
    HINT: Set the permissions of 
drwxr-xr-x /usr/src/paperless/src/../media
 to be writeable by the user running the Paperless services
  1. Since I am sure that the folder is perfectly writeable for paperless I found some older Issues with problems by checking the correct permissions if paperless is running in cobination with NFS Storage.
  2. Next I tried to get into the container and switch to paperless user
docker exec -it paperless_webserver_1 /bin/bash
root@cd95705beeea:/usr/src/paperless/src# su - paperless
$ bash
paperless@cd95705beeea:~$ cd src/
  1. But running the command again gives me the following exception.
    
    paperless@cd95705beeea:~/src$ python3 manage.py document_renamer
    Traceback (most recent call last):
    File "/usr/local/lib/python3.9/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
    File "/usr/local/lib/python3.9/site-packages/django/db/backends/sqlite3/base.py", line 423, in execute
    return Database.Cursor.execute(self, query, params)
    sqlite3.OperationalError: no such column: documents_document.document_type_id

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/usr/src/paperless/src/manage.py", line 11, in execute_from_command_line(sys.argv) File "/usr/local/lib/python3.9/site-packages/django/core/management/init.py", line 419, in execute_from_command_line utility.execute() File "/usr/local/lib/python3.9/site-packages/django/core/management/init.py", line 413, in execute self.fetch_command(subcommand).run_from_argv(self.argv) File "/usr/local/lib/python3.9/site-packages/django/core/management/base.py", line 354, in run_from_argv self.execute(*args, *cmd_options) File "/usr/local/lib/python3.9/site-packages/django/core/management/base.py", line 398, in execute output = self.handle(args, **options) File "/usr/src/paperless/src/documents/management/commands/document_renamer.py", line 30, in handle for document in tqdm.tqdm( File "/usr/local/lib/python3.9/site-packages/tqdm/std.py", line 989, in init total = len(iterable) File "/usr/local/lib/python3.9/site-packages/django/db/models/query.py", line 262, in len self._fetch_all() File "/usr/local/lib/python3.9/site-packages/django/db/models/query.py", line 1324, in _fetch_all self._result_cache = list(self._iterable_class(self)) File "/usr/local/lib/python3.9/site-packages/django/db/models/query.py", line 51, in iter results = compiler.execute_sql(chunked_fetch=self.chunked_fetch, chunk_size=self.chunk_size) File "/usr/local/lib/python3.9/site-packages/django/db/models/sql/compiler.py", line 1175, in execute_sql cursor.execute(sql, params) File "/usr/local/lib/python3.9/site-packages/django/db/backends/utils.py", line 66, in execute return self._execute_with_wrappers(sql, params, many=False, executor=self._execute) File "/usr/local/lib/python3.9/site-packages/django/db/backends/utils.py", line 75, in _execute_with_wrappers return executor(sql, params, many, context) File "/usr/local/lib/python3.9/site-packages/django/db/backends/utils.py", line 84, in _execute return self.cursor.execute(sql, params) File "/usr/local/lib/python3.9/site-packages/django/db/utils.py", line 90, in exit raise dj_exc_value.with_traceback(traceback) from exc_value File "/usr/local/lib/python3.9/site-packages/django/db/backends/utils.py", line 84, in _execute return self.cursor.execute(sql, params) File "/usr/local/lib/python3.9/site-packages/django/db/backends/sqlite3/base.py", line 423, in execute return Database.Cursor.execute(self, query, params) django.db.utils.OperationalError: no such column: documents_document.document_type_id Exception ignored in: <function tqdm.del at 0x7ff4e4f59af0> Traceback (most recent call last): File "/usr/local/lib/python3.9/site-packages/tqdm/std.py", line 1147, in del File "/usr/local/lib/python3.9/site-packages/tqdm/std.py", line 1266, in close AttributeError: 'tqdm' object has no attribute 'disable'



**To Reproduce**
Steps to reproduce the behavior:
1. Use NFS Storage for docker volume
2. Change PAPERLESS_FILENAME_FORMAT in env file.
3. Run docker exec paperless_webserver_1 python3 ./manage.py document_renamer
4. See error

**Expected behavior**
Rename documents like described in docker-compose.env file

**Relevant information**
 - Debian 11
 - Docker version 20.10.12, build e91ed57
 - paperless-ngx 1.6.0
stumpylog commented 2 years ago

That looks more like the database is either corrupted or inaccessible. Are you able to add documents without error (via the WebUI, for example)?

Rabbit234 commented 2 years ago

Uploading documents via the WebUI works fine. I can even change settings on already uploaded documents. The logfile looks good to me:

[2022-03-30 00:33:24,481] [INFO] [paperless.consumer] Consuming document.pdf
[2022-03-30 00:33:24,483] [DEBUG] [paperless.consumer] Detected mime type: application/pdf
[2022-03-30 00:33:24,491] [DEBUG] [paperless.consumer] Parser: RasterisedDocumentParser
[2022-03-30 00:33:24,500] [DEBUG] [paperless.consumer] Parsing document.pdf...
[2022-03-30 00:33:25,942] [DEBUG] [paperless.parsing.tesseract] Extracted text from PDF file /tmp/paperless/paperless-upload-ux_l0hz8
[2022-03-30 00:33:26,072] [DEBUG] [paperless.parsing.tesseract] Calling OCRmyPDF with args: {'input_file': '/tmp/paperless/paperless-upload-ux_l0hz8', 'output_file': '/tmp/paperless/paperless-5wi9jkrr/archive.pdf', 'use_threads': True, 'jobs': 2, 'language': 'deu', 'output_type': 'pdfa', 'progress_bar': False, 'skip_text': True, 'clean': True, 'deskew': True, 'rotate_pages': True, 'rotate_pages_threshold': 12.0, 'sidecar': '/tmp/paperless/paperless-5wi9jkrr/sidecar.txt'}
[2022-03-30 00:33:29,102] [DEBUG] [paperless.parsing.tesseract] Incomplete sidecar file: discarding.
[2022-03-30 00:33:30,362] [DEBUG] [paperless.parsing.tesseract] Extracted text from PDF file /tmp/paperless/paperless-5wi9jkrr/archive.pdf
[2022-03-30 00:33:30,363] [DEBUG] [paperless.consumer] Generating thumbnail for document.pdf...
[2022-03-30 00:33:30,370] [DEBUG] [paperless.parsing] Execute: convert -density 300 -scale 500x5000> -alpha remove -strip -auto-orient /tmp/paperless/paperless-5wi9jkrr/archive.pdf[0] /tmp/paperless/paperless-5wi9jkrr/convert.png
[2022-03-30 00:33:32,220] [DEBUG] [paperless.parsing.tesseract] Execute: optipng -silent -o5 /tmp/paperless/paperless-5wi9jkrr/convert.png -out /tmp/paperless/paperless-5wi9jkrr/thumb_optipng.png
[2022-03-30 00:33:37,382] [DEBUG] [paperless.classifier] Document classification model does not exist (yet), not performing automatic matching.
[2022-03-30 00:33:37,393] [DEBUG] [paperless.consumer] Saving record to database
[2022-03-30 00:33:37,441] [DEBUG] [paperless.matching] Correspondent Linux matched on document 2022-03-07 document because it contains this word: Linux
[2022-03-30 00:33:37,443] [DEBUG] [paperless.matching] Correspondent xxx matched on document 2022-03-07 document because it contains this word: Media
[2022-03-30 00:33:37,444] [DEBUG] [paperless.matching] Correspondent Microsoft matched on document 2022-03-07 document because it contains this word: Microsoft
[2022-03-30 00:33:37,453] [DEBUG] [paperless.matching] Correspondent xxx matched on document 2022-03-07 document because it contains this word: of
[2022-03-30 00:33:37,453] [DEBUG] [paperless.handlers] Detected 4 potential correspondents, so we've opted for Linux
[2022-03-30 00:33:37,454] [INFO] [paperless.handlers] Assigning correspondent Linux to 2022-03-07 document
[2022-03-30 00:33:37,591] [DEBUG] [paperless.consumer] Deleting file /tmp/paperless/paperless-upload-ux_l0hz8
[2022-03-30 00:33:37,599] [DEBUG] [paperless.parsing.tesseract] Deleting directory /tmp/paperless/paperless-5wi9jkrr
[2022-03-30 00:33:37,599] [INFO] [paperless.consumer] Document 2022-03-07 document consumption finished
stumpylog commented 2 years ago

I suppose it could be missing migrations, but I'm not a django expert.

After backing up, you could try to run:

It should already have migrated, but maybe somehow the switch got confused and didn't.

Rabbit234 commented 2 years ago

@stumpylog Thanks a lot. Didn't thought about this could be an Issue with missing migrations. After running both of the commands, the document_renamer now works perfectly fine.

paperless@cd95705beeea:~/src$ python manage.py makemigrations
Migrations for 'documents':
  documents/migrations/1017_alter_savedviewfilterrule_rule_type.py
    - Alter field rule_type on savedviewfilterrule
paperless@cd95705beeea:~/src$ python manage.py migrate
Operations to perform:
  Apply all migrations: admin, auth, authtoken, contenttypes, django_q, documents, paperless_mail, sessions
Running migrations:
  Applying admin.0003_logentry_add_action_flag_choices... OK
  Applying auth.0010_alter_group_name_max_length... OK
  Applying auth.0011_update_proxy_permissions... OK
  Applying auth.0012_alter_user_first_name_max_length... OK
  Applying authtoken.0001_initial... OK
  Applying authtoken.0002_auto_20160226_1747... OK
  Applying authtoken.0003_tokenproxy... OK
  Applying django_q.0001_initial... OK
  Applying django_q.0002_auto_20150630_1624... OK
  Applying django_q.0003_auto_20150708_1326... OK
  Applying django_q.0004_auto_20150710_1043... OK
  Applying django_q.0005_auto_20150718_1506... OK
  Applying django_q.0006_auto_20150805_1817... OK
  Applying django_q.0007_ormq... OK
  Applying django_q.0008_auto_20160224_1026... OK
  Applying django_q.0009_auto_20171009_0915... OK
  Applying django_q.0010_auto_20200610_0856... OK
  Applying django_q.0011_auto_20200628_1055... OK
  Applying django_q.0012_auto_20200702_1608... OK
  Applying django_q.0013_task_attempt_count... OK
  Applying django_q.0014_schedule_cluster... OK
  Applying documents.1000_update_paperless_all... OK
  Applying documents.1001_auto_20201109_1636... OK
  Applying documents.1002_auto_20201111_1105... OK
  Applying documents.1003_mime_types... OK
  Applying documents.1004_sanity_check_schedule... OK
  Applying documents.1005_checksums... OK
  Applying documents.1006_auto_20201208_2209... OK
  Applying documents.1007_savedview_savedviewfilterrule... OK
  Applying documents.1008_auto_20201216_1736... OK
  Applying documents.1009_auto_20201216_2005... OK
  Applying documents.1010_auto_20210101_2159... OK
  Applying documents.1011_auto_20210101_2340... OK
  Applying documents.1012_fix_archive_files... OK
  Applying documents.1013_migrate_tag_colour... OK
  Applying documents.1014_auto_20210228_1614... OK
  Applying documents.1015_remove_null_characters... OK
  Applying documents.1016_auto_20210317_1351... OK
  Applying documents.1017_alter_savedviewfilterrule_rule_type... OK
  Applying paperless_mail.0001_initial... OK
  Applying paperless_mail.0002_auto_20201117_1334... OK
  Applying paperless_mail.0003_auto_20201118_1940... OK
  Applying paperless_mail.0004_mailrule_order... OK
  Applying paperless_mail.0005_help_texts... OK
  Applying paperless_mail.0006_auto_20210101_2340... OK
  Applying paperless_mail.0007_auto_20210106_0138... OK
  Applying paperless_mail.0008_auto_20210516_0940... OK
paperless@cd95705beeea:~/src$ python3 manage.py document_renamer
0it [00:00, ?it/s]

Again thanks for keeping this awesome project alive. I love paperless :+1:

github-actions[bot] commented 1 year ago

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new discussion or issue for related concerns.