the-paperless-project / paperless

Scan, index, and archive all of your paper documents
GNU General Public License v3.0
7.84k stars 501 forks source link

Disabling encryption failing after one file #714

Open jannislehmann opened 3 years ago

jannislehmann commented 3 years ago

Hey,

I was planning on moving to / trying paperless-ng. However, I had to first decrypt my files. Therefore, I created a bash within the paperless_web container and executed the following command: ./manage.py change_storage_type gpg unencrypted

This resulted in the following log:

b'Decrypting 20180801000000: 0219CFB2B927132B00841159778E1401'
Traceback (most recent call last):
  File "./manage.py", line 11, in <module>
    execute_from_command_line(sys.argv)
  File "/usr/lib/python3.8/site-packages/django/core/management/__init__.py", line 371, in execute_from_command_line
    utility.execute()
  File "/usr/lib/python3.8/site-packages/django/core/management/__init__.py", line 365, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/usr/lib/python3.8/site-packages/django/core/management/base.py", line 288, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/usr/lib/python3.8/site-packages/django/core/management/base.py", line 335, in execute
    output = self.handle(*args, **options)
  File "/usr/src/paperless/src/documents/management/commands/change_storage_type.py", line 66, in handle
    self.__gpg_to_unencrypted(passphrase)
  File "/usr/src/paperless/src/documents/management/commands/change_storage_type.py", line 96, in __gpg_to_unencrypted
    os.unlink(path)
FileNotFoundError: [Errno 2] No such file or directory: '/usr/src/paperless/src/../media/documents/originals/0000328.pdf.gpg'

All files in the mentioned directory are encrypted and the file 328 gets decrypted after issuing the command. But the script somehow tries to access it afterwards.

jonaswinkler commented 3 years ago

Hello.

This breaks due to some change to the way paperless handles filenames (namely the configuration option PAPERLESS_FILENAME_FORMAT). The decryption logic was simply not adjusted to respect these changes and there weren't any test cases in place. You will first have to check if your document archive is still in good shape. It might be not. Please check if:

If any of the above is not true, this either needs to be fixed (which is possible), or you need to recover from a backup.

Paperless-ng still reads encrypted files just fine and the decryption command is verified to work.

Edit. However, there is no warranty. Create a backup before you go this route.

jannislehmann commented 3 years ago

Hey, thanks for the fast answer.

I am not yet using paperless-ng, but the main version (paperless). However, I just tried out paperless-ng and used this command: https://paperless-ng.readthedocs.io/en/latest/administration.html#disabling-encryption This sadly fails due to some python dependency missing: ModuleNotFoundError: No module named 'termcolor'

Edit: The files are all still ending with *.pdf.gpg. Thus, the file is still encrypted.

jonaswinkler commented 3 years ago

try copying the file 0000328.pdf.gpg out of the media directory, rename it to .pdf, and see if you can open it, since it might have been decrypted in place.

Regarding the issue: Noted. Now I know where that package was used. I'll have that fixed soon.

jannislehmann commented 3 years ago

Alright, great :)

Okay, I did some further testing: I used a backup, which is entirely encrypted. I created a paperless-ng instance and can download the files. They are missing the file ending. After appending .pdf to any file, I can open and view it. However, the thumbnails are missing.

I will re-try to decrypt the files using original paperless again. Before I ran the decrypt command, I had 400 files which all ended with .pdf.gpg. After I ran the decrypt command, I have 399 files which end with .pdf.gpg and one file got renamed to .pdf. This is the file 0000328.pdf. This file is openable and the data is still accessible.

Edit 19:41: I just tried a document export which worked out fine for all files.

jonaswinkler commented 3 years ago

Okay, hold on. You're actually the first one going this route and I totally forgot to take care of something during the migration process regarding encrypted files.

Uhm, I'd like to address this and will keep you posted.

Importing an export from paperless into the current version of paperless-ng won't work.

jannislehmann commented 3 years ago

Nah, this is not what I am trying to achieve.

I want to shutdown my old instance and decrypt the old files before. Afterwards, I want to start paperless-ng.

I don't want to use an export since this would mean that I will loose all the tags etc.

jonaswinkler commented 3 years ago

I think the best way would be to fix up the decryption code in this repository, but I don't know how active the people responsible for merging changes are.

jannislehmann commented 3 years ago

Sorry, for asking again:

Do you know that options I would have with paperless-ng? You do offer a command to decrypt, which seems to be broken because you tried to clean up dependencies. If this would be fixed, I could decrypt the files using paperless-ng? Therefore, I should be able to use paperless-ng already and wait for this fix?

The issue in this repo seems to be: https://github.com/the-paperless-project/paperless/blob/master/src/documents/management/commands/change_storage_type.py#L95

jonaswinkler commented 3 years ago

Alright, summary of the issue:

This might take a bit of time.

jannislehmann commented 3 years ago

Alright, thank you a lot for your effort, work and fast answers!

I removed the two lines in question: https://github.com/the-paperless-project/paperless/blob/master/src/documents/management/commands/change_storage_type.py#L95 Afterwards, all files decrypted fine. Now I am able to start paperless-ng and view all files. Lastly, I will have to migrate sqlite to postgres.