[BUG] Saving a document fails with no error.

zjean commented 3 years ago

Describe the bug I added a new document via the web interface. Opened it, and tried to add the metadata. When saving the file, nothing happens. I checked the developr tools of my browser (Chrome), and saw a network error: HTTP 400 {"content":["Null-tekens zijn niet toegestaan."]}

To Reproduce Steps to reproduce the behavior:

Go to edit a document
Edit its properties
Save
See error in developer console

Expected behavior I expect to save the document

Screenshots

Webserver logs

The paper less logs don't say anything

Relevant information

Host OS: Debian
Browser Chrome
Version 1.2.1
Installation method: docker-compose
Any configuration changes you made in docker-compose.yml, docker-compose.env or paperless.conf.

Edit: I see that in the content field the value '\u0000' is present in several words. I see this character instead of the letter combination 'ti', like this (pasted from the json in the network tab, since the edit field shows a square): belas\u0000ngdienst Removing this in the content textarea in the ui doesn't help.

jonaswinkler commented 3 years ago

My best guess based on that is that when this document was added, the null character \0 (used for marking the end of a string) was somehow extracted from the PDF content and saved in the content field. The API now complains about that character being in the content field when saving.

What happens if you cut the content field, paste that in another editor, and insert it into the content field again, then save?
What happens if you execute the management command document_archiver -f -d 236?

jonaswinkler commented 3 years ago

Edit: I see that in the content field the value '\u0000' is present in several words. I see this character instead of the letter combination 'ti', like this (pasted from the json in the network tab, since the edit field shows a square): belas\u0000ngdienst Removing this in the content textarea in the ui doesn't help.

Hm. Is that document confidential? I'd really like to figure out if this is caused by tesseract, OCRmyPDF, or something in paperless.

zjean commented 3 years ago

Thanks! I cut the content, and pasted it back from notepad++. That allowed me to save the document. The document is quite confidential, sorry. If I encounter it another time with a less sensitive document, I will let you kno!

jonaswinkler commented 3 years ago

THe root of this issue is addressed as part of #794, so I'll go ahead and close this.

jonaswinkler / paperless-ng

[BUG] Saving a document fails with no error. #691