jonaswinkler / paperless-ng

A supercharged version of paperless: scan, index and archive all your physical documents
https://paperless-ng.readthedocs.io/en/latest/
GNU General Public License v3.0
5.37k stars 357 forks source link

[BUG] Traceback in sanity_checker #511

Closed kinglike1337 closed 3 years ago

kinglike1337 commented 3 years ago

I updated my paperless-ng test instance from version 1.0.0 to the new version 1.1.0. I noticed a python traceback in the docker logs after the first startup:

ERROR Failed [oregon-undress-uniform-magnesium] - list.remove(x): x not in list : Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/django_q/cluster.py", line 436, in worker
    res = f(*task["args"], **task["kwargs"])
  File "/usr/src/paperless/src/documents/tasks.py", line 95, in sanity_check
    messages = sanity_checker.check_sanity()
  File "/usr/src/paperless/src/documents/sanity_checker.py", line 95, in check_sanity
    present_files.remove(os.path.normpath(doc.archive_path))
ValueError: list.remove(x): x not in list

I restarted the application but the error was not raised again.

jonaswinkler commented 3 years ago

Thank you for reporting that.

kinglike1337 commented 3 years ago

Yes, I am using docker-compose and a standard docker volume for "data":

...
volumes:
  - data:/usr/src/paperless/data
  - media:/usr/src/paperless/media
  - ./export:/usr/src/paperless/export
  - ./consume:/usr/src/paperless/consume
...
volumes:
  data:
  media:
jonaswinkler commented 3 years ago

Thank you! I'll investigate.

kinglike1337 commented 3 years ago

I did a little check of database table column "documents_document.filename" and the content of the archive path in the media directory.

There were two entries in the database that could not be found in the archive path of the media directory:

Maybe these files triggered the sanity check?

jonaswinkler commented 3 years ago

That's very good info, and I've identified the issue.

kinglike1337 commented 3 years ago

Yes, you are correct the corresponding *.pdf files are there and existing.

I re-scheduled the "Perform sanity check" and the same error showed up again.

jonaswinkler commented 3 years ago

The error you're seeing is the side effect of an oversight during filename generation.

I'll get that addressed ASAP, and as a result, the santity check error will also go away.

kinglike1337 commented 3 years ago

Thank you very much for efforts and your fast analysis.

You are right, the originals are intact in my case and I just overlooked the missing pdf file. Good catch.

Contents of media directory:

ls -1 archive/none/2021/test1* originals/none/2021/test1*
archive/none/2021/test1.pdf
originals/none/2021/test1.docx
originals/none/2021/test1.odt
jonaswinkler commented 3 years ago

Fixed in 1.1.1.

kinglike1337 commented 3 years ago

Thank you very much. The fixed worked:

[2021-02-13 09:01:14,652] [DEBUG] [paperless.migrations] Removing /usr/src/paperless/src/../media/documents/archive/none/2021/test1.pdf
[2021-02-13 09:01:14,665] [INFO] [paperless.migrations] Regenerating archive document for document ID:16
[2021-02-13 09:01:14,676] [INFO] [paperless.parsing.tika] Sending /usr/src/paperless/src/../media/documents/originals/none/2021/test1.odt to Tika server
[2021-02-13 09:01:15,036] [INFO] [paperless.parsing.tika] Converting /usr/src/paperless/src/../media/documents/originals/none/2021/test1.odt to PDF as /tmp/paperless/paperless-r98pe2bm/convert.pdf
[2021-02-13 09:01:20,567] [DEBUG] [paperless.parsing.tika] Deleting directory /tmp/paperless/paperless-r98pe2bm
[2021-02-13 09:01:20,569] [INFO] [paperless.migrations] Regenerating archive document for document ID:15
[2021-02-13 09:01:20,570] [INFO] [paperless.parsing.tika] Sending /usr/src/paperless/src/../media/documents/originals/none/2021/test1.docx to Tika server
[2021-02-13 09:01:20,969] [INFO] [paperless.parsing.tika] Converting /usr/src/paperless/src/../media/documents/originals/none/2021/test1.docx to PDF as /tmp/paperless/paperless-ctqo4lns/convert.pdf
[2021-02-13 09:01:22,686] [DEBUG] [paperless.parsing.tika] Deleting directory /tmp/paperless/paperless-ctqo4lns