mailcow / mailcow-dockerized

mailcow: dockerized - 🐮 + 🐋 = 💕
https://mailcow.email
GNU General Public License v3.0
8.93k stars 1.17k forks source link

Incorrect encoding in non-latin quarantined mails, also after release #3930

Open ValdikSS opened 3 years ago

ValdikSS commented 3 years ago

Prior to placing the issue, please check following: (fill out each checkbox with an X once done)

Summary

Mailcow commit a832becbd530603710a823be526a9ec4d9f1f89d If the email in Windows-1251 encoding (others may be affected as well) gets quarantined, its text does not show correctly in quarantine web interface, and email remains unreadable after release.

Logs

brokenmails.zip These are two exact emails, one of which is in correct encoding which was exported from junk folder, another is what quarantine release delivered to inbox.

Reproduction

  1. Get quarantined email in Russian, with Windows-1251 encoding
  2. Try to release the email
  3. Receive unreadable email in inbox

Screenshot_20210111_013505-fs8

Unfortunately I no longer can show you a screenshot of quarantine web interface because I learned similar emails as ham and they no longer go to quarantine.

System information

Question Answer
My operating system Linux Ubuntu 20.04
Is Apparmor, SELinux or similar active? Yes, AppArmor. No issues with it in audit logs.
Virtualization technlogy (KVM, VMware, Xen, etc - LXC and OpenVZ are not supported Bare metal
Server/VM specifications (Memory, CPU Cores) 4 cores, 16 GB RAM
Docker Version (docker version) 20.10.1
Docker-Compose Version (docker-compose version) 1.27.4, build 40524192
Reverse proxy (custom solution) Custom configuration, did not touch Mailcow configs, irrelevant
andryyy commented 3 years ago

You don't have a db dump anymore, right?

Or any other mail with that problem currently in your quarantine?

ValdikSS commented 3 years ago

@andryyy I've used password recovery and got the message in quarantine, it's broken. How should I proceed?

Screenshot_20210111_194759-fs8

andryyy commented 3 years ago

image

Dunno, the mails seem to have encoding problems in general. :/

ValdikSS commented 3 years ago

Try to receive new post notification. It seems that registration/password reminding letters don't have space between some header name and value, but post notifications have them.

andryyy commented 3 years ago

Please give me more time for this.

I think the mail encoding is a bit messed up, but I'm not sure yet...

The subject seems to be read as UTF-8 (perhaps?). Not sure.

ValdikSS commented 3 years ago

Here's the original email, notification of new forum message. The one which is the first post is ruboard → valdiks@bk.ru (mail.ru) → iam@valdikss.org.ru (mailcow). This one is from valdiks@bk.ru mailbox. As you can see, the subject has encoding and is in Windows-1251, but Content-type header has no space between its name and value: Content-type:text/plain;charset=Windows-1251. Maybe that's an issue.

Message16103458220307880921.zip

Remember password message on the contrary have proper Content-Type: text/plain; charset=Windows-1251 (with space), but no encoding in Subject: Subject: Забыли пароль?.

Message16103835610793656295(1).zip

andryyy commented 3 years ago

That's a good catch. :) I will check that.

ValdikSS commented 3 years ago

Here's another broken message, this time from Google Groups. message.zip This message contains strange ÐžÑ symbols in the header, near To field. This is what Google sends for some reason (it persist in older messages as well).

X-BeenThere: anticensority+manager@googlegroups.com
Received: by 2002:a1c:2e50:: with SMTP id u77ls1076220wmu.2.canary-gmail; Tue,
 08 Dec 2020 05:02:21 -0800 (PST)
X-Received: by 2002:a05:600c:268b:: with SMTP id 11mr3827005wmt.78.1607432541168;
        Tue, 08 Dec 2020 05:02:21 -0800 (PST)
MIME-Version: 1.0
To: =?UTF-8?B?0JzQvtC00LXRgNCw0YLQvtGA0Ysg0YHQv9Cw0LzQsA==?= <anticensority+managers@googlegroups.com>
От: noreply-spamdigest@google.com
Subject: =?UTF-8?B?W2FjXSDQntGC0YfQtdGCINC80L7QtNC10YDQsNGC0L7RgNCwINC+INGB0L/QsNC80LUg?=
    =?UTF-8?B?0LIg0LPRgNGD0L/Qv9C1IGFudGljZW5zb3JpdHlAZ29vZ2xlZ3JvdXBzLmNvbQ==?=
Message-ID: <0000000000009de11d05b5f38dda@google.com>
Date: Tue, 08 Dec 2020 13:02:21 +0000
Content-Type: text/plain; charset="UTF-8"

Screenshot_20210112_012530-fs8

andryyy commented 3 years ago

I only see those content type fails with russian mail. And not even all. One needs to check wether they are correctly encoded/formatted and if we really want to work that around if they are not.

The previous "originals" also messed up my local mail client.

dragoangel commented 3 years ago

I work every day with Cyrillic, postfix handle all correctly. This issue just on sender side and don't think there actually must be/can be any fix for sender who send mail with incorrect mime type/encoding from his side.

Muwahhidun commented 3 years ago

And when you create contacts in Russian, are they displayed correctly? I have question marks instead of Russian letters. I myself am looking for an answer to this problem. In the demo on the Sogo website, and on mailcow, everything is OK, but in my installation ????? such signs


А у тебя контакты когда на Русском создаешь, корректно отображаются? У меня вопросительные знаки вместо Русских букв. Сам ищу ответ на эту проблему. В демке на сайте Sogo и на mailcow все ок, а вот в моей установке ????? такие знаки

andryyy commented 3 years ago

Do you use an external SQL?

Muwahhidun commented 3 years ago

Do you use an external SQL?

no, I have an official docker compose. 19 containers.

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

ValdikSS commented 2 years ago

The issue is still not fixed, please reopen. I can provide fresh .eml files.

ValdikSS commented 2 years ago

@andryyy, I also can provide database dumps. Not removing the quarantine data yet.

andryyy commented 2 years ago

That would be great. Can you mail to @.*** ?

I will need some time though as I’m currently in hospital.

Am 01.01.2022 um 19:57 schrieb ValdikSS @.***>:

 @andryyy, I also can provide database dumps. Not removing the quarantine data yet.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.

ValdikSS commented 2 years ago

The email is not shown. Please mail me at iam@valdikss.org.ru, I'll mail you back.

andryyy commented 2 years ago

08pille-rennrodel@icloud.com

andryyy commented 2 years ago

If these errors only happen with the same wrongly encoded mails from your previously sent items I will not work on it. The sender will need to fix their issues then as it was stated before.

I don't think we are responsible to fix that. :/

Drago works with Russian mail all the time. It is fine for him. Your example mail was totally broke.

ValdikSS commented 2 years ago

No, this time it's a Google Groups email. And others, I need to check.

ValdikSS commented 2 years ago

Please carry me on what I should do. Right now the email looks like this: Screenshot 2022-01-02 at 01-28-02 mailcow UI

ValdikSS commented 2 years ago

Haha, this email has Russian "От:" in the email header instead of "From:".

dragoangel commented 2 years ago

Haha, this email has Russian "От:" in the email header instead of "From:".

(facepalm) omg 😱

ValdikSS commented 2 years ago

So right now there are two issues with Mailcow:

  1. The quarantine system breaks the headers on the first non-7-bit-ascii symbol and not on \r\n\r\n
  2. The message is re-encoded when entering quarantine and when released, that's why broken mails are released broken after quarantine.

For 1) mailcow should split headers from the body by searching \r\n\r\n, and for the 2) mailcow should not assume encoding and treat emails as a sequence of bytes, at least for releasing.

andryyy commented 2 years ago

We use a very popular mail parser. I think your mails are a bit off.

Mit besten Grüßen André Peters

Am 02.01.2022 um 11:45 schrieb ValdikSS @.***>:

 So right now there are two issues with Mailcow:

The quarantine system breaks the headers on the first non-8-bit-ascii symbol and not on \r\n\r\n The message is re-encoded when entering quarantine and when released, that's why broken mails are released broken after quarantine. For 1) mailcow should split headers from the body by searching \r\n\r\n, and for the 2) mailcow should not assume encoding and treat emails as a sequence of bytes, at least for releasing.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you modified the open/close state.

ValdikSS commented 2 years ago

We use a very popular mail parser. I think your mails are a bit off.

Sure they are, but this shows that even such monsters as Google could make program error and include translated string into the headers.
It'll be handy to have a more loyal parser for more compatibility with broken emails.

dragoangel commented 2 years ago

Not sure mail header could be at all written on non-latin. I really not see how it should be parsed, by rspamd as well. If you sql dump this email can you send it to me? In telegram for example, I can't fix it, but wanted to look. I never faced such emails.

ValdikSS commented 2 years ago

Here's the original .eml [ac] Отчет модератора о спаме в группе anticensority@googlegroups.com - anticensority+noreply@googlegroups.com - 2020-12-08 1602.zip

dragoangel commented 2 years ago

Quite old email, they even signed "От" in dkim... Rspand can't get it as well

dragoangel commented 2 years ago

@ValdikSS I really think this more exclusion then ok. How many such email you have?

ValdikSS commented 2 years ago

I receive it when somebody posts to my Google Group and the message is flagged as spam. It happens once in a month or so.

dragoangel commented 2 years ago

Did you tried ask Google groups why they have such strange behavior? I can't find any standard or draft where such behavior is allowed

ValdikSS commented 2 years ago

I've filled a "feedback to Google" with the email example. No reply so far.

dragoangel commented 2 years ago

I've filled a "feedback to Google" with the email example. No reply so far.

Unfortunately based on my experience with Google I can assume you will never get a reply

ValdikSS commented 2 years ago

Releasing quarantined email from nnmclub.to leads to this. The encoding is cp1251.

image

ValdikSS commented 2 years ago

Новое личное сообщение.zip

dragoangel commented 2 years ago

This case 100% reproducible, yes. Have same behavior. Cyrillic subject of the email was been parsed correctly and was been stored in SQL.

But msg at SQL stored as: &#2013265927;&#2013265924;&#2013265920;&#2013265920;&#2013265922;&#2013265921;&#2013265922;&#2013265922;&#2013265923;&#2013265929;&#2013265922;&#2013265925;, nickname! when in original email it was: Çäðàâñòâóéòå, nickname! which from windows-1251 should been displayed as: Здравствуйте, nickname!

Msg do not have multiparts:

Content-type: text/plain; charset=windows-1251
Content-transfer-encoding: 8bit
dragoangel commented 2 years ago

I've filled a "feedback to Google" with the email example. No reply so far.

btw @ValdikSS you received any feedback?😁

ValdikSS commented 2 years ago

No, nothing from Google.

dragoangel commented 2 years ago

No, nothing from Google.

Well, it was exacted, they create bunch of services and forget that any service need to have at least a bit of support 😖

drlight17 commented 2 years ago

I've just faced the same problem. Any fixes on this?

milkmaker commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

ValdikSS commented 2 years ago

Unstale

milkmaker commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

ValdikSS commented 2 years ago

unstale

milkmaker commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

milkmaker commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

ValdikSS commented 1 year ago

Please reopen.

milkmaker commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.