jonaswinkler / paperless-ng

A supercharged version of paperless: scan, index and archive all your physical documents
https://paperless-ng.readthedocs.io/en/latest/
GNU General Public License v3.0
5.38k stars 356 forks source link

[BUG] Saving a document fails with no error. #691

Closed zjean closed 3 years ago

zjean commented 3 years ago

Describe the bug I added a new document via the web interface. Opened it, and tried to add the metadata. When saving the file, nothing happens. I checked the developr tools of my browser (Chrome), and saw a network error: HTTP 400 {"content":["Null-tekens zijn niet toegestaan."]}

To Reproduce Steps to reproduce the behavior:

  1. Go to edit a document
  2. Edit its properties
  3. Save
  4. See error in developer console

Expected behavior I expect to save the document

Screenshots image

Webserver logs

The paper less logs don't say anything

Relevant information

Edit: I see that in the content field the value '\u0000' is present in several words. I see this character instead of the letter combination 'ti', like this (pasted from the json in the network tab, since the edit field shows a square): belas\u0000ngdienst Removing this in the content textarea in the ui doesn't help.

jonaswinkler commented 3 years ago

My best guess based on that is that when this document was added, the null character \0 (used for marking the end of a string) was somehow extracted from the PDF content and saved in the content field. The API now complains about that character being in the content field when saving.

jonaswinkler commented 3 years ago

Edit: I see that in the content field the value '\u0000' is present in several words. I see this character instead of the letter combination 'ti', like this (pasted from the json in the network tab, since the edit field shows a square): belas\u0000ngdienst Removing this in the content textarea in the ui doesn't help.

Hm. Is that document confidential? I'd really like to figure out if this is caused by tesseract, OCRmyPDF, or something in paperless.

zjean commented 3 years ago

Thanks! I cut the content, and pasted it back from notepad++. That allowed me to save the document. The document is quite confidential, sorry. If I encounter it another time with a less sensitive document, I will let you kno!

jonaswinkler commented 3 years ago

THe root of this issue is addressed as part of #794, so I'll go ahead and close this.