jonaswinkler / paperless-ng

A supercharged version of paperless: scan, index and archive all your physical documents
https://paperless-ng.readthedocs.io/en/latest/
GNU General Public License v3.0
5.37k stars 357 forks source link

[BUG] Email consumer - "Skipping attachment xxx with content disposition" #1088

Open jhf2442 opened 3 years ago

jhf2442 commented 3 years ago

Describe the bug Email consumer skips a regular PDF file attached to an email. Works fine with other emails/PDF attachements

Expected behavior PDF file imported into paperless-ng :-)

Screenshots If applicable, add screenshots to help explain your problem.

Webserver logs

[2021-06-03 12:19:27,962] [DEBUG] [paperless_mail] Rule f1.Fetch from f1: Processing mail Einladung zur 6-sten ordentlichen Mitgliederversammlung from xxx@xxx with 1 attachment(s)
[2021-06-03 12:19:27,964] [DEBUG] [paperless_mail] Rule f1.Fetch from f1: Skipping attachment 2021-06-24_Einladung_MV.pdf with content disposition
[2021-06-03 12:19:27,964] [DEBUG] [paperless_mail] Rule f1.Fetch from f1: Processed 1 matching mail(s)
[2021-06-03 12:19:27,964] [DEBUG] [paperless_mail] Rule f1.Fetch from f1: Running mail actions on 0 mails

from the email source

----boundary_165_8ab2a97c-68ce-45d8-a050-6ffbaf2afa21
Content-Type: application/octet-stream; name=2021-06-24_Einladung_MV.pdf
Content-Transfer-Encoding: base64

Looking at an email that works fine, I see an additional field - could this be the issue ?

Content-Type: application/octet-stream; name=SELF.pdf
Content-Transfer-Encoding: base64
Content-ID: 87d8e9ce-3136-4a36-894c-173420261f45
Content-Disposition: attachment; filename=SELF.pdf

Relevant information

CallMeTerdFerguson commented 3 years ago

I believe this is unrelated to the email consumer itself and has to do with the PDF in question. When trying to upload my Kohl's Statement, downloaded as a PDF from Edge, I receive File type application/octet-stream not supported as well. I also have seen recently a couple of other PDFs I've tried to upload returning that error, where that source did not previously error out. I've tried these from the file system watcher and the homepage upload sources and get the same result. Whether that is a change on the source end or something in Paperless I haven't had time to dig into.

jhf2442 commented 3 years ago

This is then a different topic. If I upload the PDF as a regular file, it is imported into paperless w/o any issue

jhf2442 commented 3 years ago

Actuallly found thecode in src/paperless_mail/mail.py line 271ff :

             if not att.content_disposition == "attachment" and rule.attachment_type == MailRule.ATTACHMENT_TYPE_ATTACHMENTS_ONLY:  # NOQA: E501
                self.log(
                    'debug',
                    f"Rule {rule}: "
                    f"Skipping attachment {att.filename} "
                    f"with content disposition {att.content_disposition}")
                continue

in my case there is no field Content-disposition therefore the comparison fails and the attachement is skipped. -> Question is what this test is for ? maybe it could be extended to ignore cases where the field is not set ?

could be that the mailer used to send the mail is buggy, unfortunately there's also no X-Mailer entry in the headers. I suspect some kind of bulk mailer from a local hosting provider.

trev142 commented 3 years ago

From memory I had a similar issue with emails from a particular company. Not knowing exactly what the issue was I decided it was something to do with the way they attached the files to the email. I ended up changing my mail rule so that "Attachment type" was set to " Process all files, including 'inline' attachments." and set an attachment filter to *.pdf. Good luck!