salyh / elasticsearch-imap

IMAP and POP3 email importer for Elasticsearch (no river anymore)
Apache License 2.0
100 stars 25 forks source link

Missing Text Content #20

Open stevepop opened 8 years ago

stevepop commented 8 years ago

I am using this for pulling emails from an IMAP server. While it seems to be indexing all emails, a proportion of those emails have their contents missing i.e textContent and htmlContent are empty in Elasticsearch. Unfortunately this is happening randomly so I have no idea what could be the problem.

I also did not see any error in the logs that could give me an idea of why these contents are not being indexed.

See example extract from sense below;

 "mailboxType": "IMAP",
               "popId": null,
               "receivedDate": 1449630321000,
               "sentDate": 1449630310000,
               "size": 8455,
               "subject": "Re: Newsletter: 9th December 2015",
               "textContent": "",
               "htmlContent": null ```
salyh commented 8 years ago

can happen if the content type of the mail is invalid. If you can send me such a failing e-mail (or post it here) i will have a look.

stevepop commented 8 years ago

Hi @salyh, thanks for your response. I would prefer to send the failing emails to you directly· Can you send me where to send it to? Also, let me know what exactly you want me to send. ie, mail including headers, etc)

Further investigations show that most of these emails with missing message contents are sent from Microsoft Outlook and Outlook Web App. See extract of one example below;

Subject: Test Mail 1 14/12/2015 _ 0958

Thread-Topic: Test Mail 1 14/12/2015 _ 0958

Thread-Index: AdE2VkynlG/aqZyHTDKBjR4vUcA3ww==

Date: Mon, 14 Dec 2015 04:01:19 -0600

Message-ID: <9c9c4cdfbdb64edc97e47393316960bf@MBX10C-ORD1.mex06.mlsrvr.com>

Accept-Language: en-GB, en-US

Content-Language: en-US

X-MS-Has-Attach:

X-MS-TNEF-Correlator: <9c9c4cdfbdb64edc97e47393316960bf@MBX10C-ORD1.mex06.mlsrvr.com>

MIME-Version: 1.0

X-MS-Exchange-Transport-FromEntityHeader: Hosted

X-MS-Exchange-Organization-Network-Message-Id: f26cf0bd-af6e-4535-2399-08d3046d8451

X-MS-Exchange-Organization-AVStamp-Mailbox: SMEXw]nP;1220900;0;This mail has

 been scanned by Trend Micro ScanMail for Microsoft Exchange;

X-MS-Exchange-Organization-SCL: 0

X-MS-Exchange-Organization-AuthSource: MBX11D-ORD1.mex06.mlsrvr.com

X-MS-Exchange-Organization-AuthAs: Anonymous

Thanks

salyh commented 8 years ago

For my emailadress see https://github.com/salyh (left side). If you want to encrypt your Mails with PGP ply find my key here: https://pgp.mit.edu/pks/lookup?op=get&search=0x7903F81190910A83

stevepop commented 8 years ago

Thanks Hendrik, email sent!