znuny / Znuny

Znuny/Znuny LTS is a fork of the ((OTRS)) Community Edition, one of the most flexible web-based ticketing systems used for Customer Service, Help Desk, IT Service Management.
https://www.znuny.org
GNU General Public License v3.0
335 stars 82 forks source link

Bug multipart/alternative messages are not indexed properly #578

Open dpalic opened 1 week ago

dpalic commented 1 week ago

Environment

Expected behavior

if a message is like this:

Date: Sun, 19 May 2024 05:25:52 GMT
Message-Id: <202405190525@abcdef>
Content-Type: multipart/alternative;
 boundary="----------=_1716096352-2010236-69"
Content-Transfer-Encoding: 7bit
MIME-Version: 1.0
Subject: =?utf-8?Q?Quarant=C3=A4ne_=C3=9Cbersicht?=
From: "Hosted Security Portal" <bounce@example.com>
Content-Disposition: inline
To: info@example.com
X-Modified-HTML: 4

This is a multi-part message in MIME format...

------------=_1716096352-2010236-69
Content-Type: text/plain; charset="utf-8"
Content-Disposition: inline
Content-Transfer-Encoding: 7bit
MIME-Version: 1.0
X-Mailer: MIME-tools 5.509 (Entity 5.509)

 Um sich den Email Security Cloud Nachrichten Report für .... anzuschauen klicken Sie bitte hier: https://admin.someexample.com/r/abcdef

------------=_1716096352-2010236-69
Content-Type: text/html; charset="utf-8"
Content-Disposition: inline
Content-Transfer-Encoding: base64
MIME-Version: 1.0
X-Mailer: MIME-tools 5.509 (Entity 5.509)

`some base64 content containing HTML`

 ------------=_1716096352-2010236-69-- 

Actual behavior

on creating the index the content part Content-Type: text/html; seems to get fully skipped on creating the index.

Sadly in real life some e-mails are using Content-Type: text/plain; to refer to a webpage link. This text is completely different to the text, which is in Content-Type: text/html; The html part is having the full details, also shown to the agents in Zoom view. And the text/plain is having only a short description and a reference to a website link, which content is the same as the Content-Type: text/html;

In that case the agents see other texts (html part) as the content, as the one which in in index in the database. Thus they cannot find tickets and they show it by copy and pasting specific content of ticketZoomView and adding that text into search for tickets, which itself results into no matching result. So it is confusing for the agents, but from the technical perspective it is odd to have not the full article in the index.

How to reproduce

Create a message likes shared above. I could share you on private path some mails, which are GDPR relevant and thus would require some agreement.

Additional information

We checked also ./Kernel/System/EmailParser.pm and we see that it shall behave already properly. But the case on Content-Type: multipart/alternative; seems to not convert the part Content-Type: text/html; to base64 and afterwards to concatenate the cleaned HTML content to the already parsed Content-Type: text/plain;

rkaldung commented 5 days ago

@dpalic I can't verify this with the snippet above. Please provide an anonymized email with the described behavior.