nextcloud / mail

💌 Mail app for Nextcloud
https://apps.nextcloud.com/apps/mail
GNU Affero General Public License v3.0
846 stars 259 forks source link

`mb_convert_encoding` does not support `windows-1250`. #1037

Open ChristophWurst opened 6 years ago

ChristophWurst commented 6 years ago

Expected behavior

The app should be able to read popular encodings.

Actual behavior

Some encodings (e.g. windows-1250 and Arabic) cause warnings/errors. Apparently that's not supported by the build-in function: https://secure.php.net/manual/de/mbstring.supported-encodings.php. However, some clients (Outlook, I suppose) use this encoding.

Mail app

Mail app version: 0.9.0

Server configuration

PHP version: 7.0

Nextcloud Version: 14 Beta 4

Ref https://sentry.io/share/issue/303496811cbc42899c531fbe51a3e37f/.

--- Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/62229807-mb_convert_encoding-does-not-support-windows-1250?utm_campaign=plugin&utm_content=tracker%2F44154351&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F44154351&utm_medium=issues&utm_source=github).
whirm commented 5 years ago

I'm not seeing .ics files attached to e-mails (works just fine with other webmails) and this error shows every time I load an e-mail:

mb_convert_encoding(): Illegal character encoding specified at /var/www/html/custom_apps/mail/lib/Model/IMAPMessage.php#553

I'm guessing it's the same issue?

I'm running 16.0.1 and Mail 0.15.1 on PHP 7.3.6

ChristophWurst commented 5 years ago

I'm guessing it's the same issue?

would you be able to attach debugger or add some logs to verify?

HLFH commented 3 years ago

For windows-1256 / cp1256 support, could be interesting to look into this code: https://github.com/khaled-alshamaa/ar-php/blob/master/src/Arabic.php

At the section Convert Arabic string into glyph joining in UTF-8 hexadecimals stream.

For other code pages, this code is used:

$data = mb_convert_encoding($data, 'UTF-8', $p->getCharset());

If the code page is in Arabic, some similar code could be used:

require_once 'ar-php/src/Arabic.php';
[...]
$Arabic = new I18N_Arabic('Glyphs');
$data = $Arabic->utf8Glyphs($data);

What do you think?

EDIT: I did not make it work with ar-php and it was easier to fix with iconv.

HLFH commented 3 years ago

iconv does not support the 27 following mbstring charsets:

UTF7-IMAP ISO-2022-JP-MS CP51932 SJIS-mac (alias: MacJapanese) SJIS-Mobile#DOCOMO (alias: SJIS-DOCOMO) SJIS-Mobile#KDDI (alias: SJIS-KDDI) SJIS-Mobile#SOFTBANK (alias: SJIS-SOFTBANK) UTF-8-Mobile#DOCOMO (alias: UTF-8-DOCOMO) UTF-8-Mobile#KDDI-A UTF-8-Mobile#KDDI-B (alias: UTF-8-KDDI) UTF-8-Mobile#SOFTBANK (alias: UTF-8-SOFTBANK) ISO-2022-JP-MOBILE#KDDI (alias: ISO-2022-JP-KDDI) JIS JIS-ms CP50220 CP50220raw CP50221 CP50222 byte2be byte2le byte4be byte4le BASE64 HTML-ENTITIES (alias: HTML) 7bit 8bit HZ

But iconv supports windows-1250 and windows-1256 charsets among many others.

I know that for mb_convert_encoding, as of PHP 8.0.0, a ValueError is thrown if the value of to_encoding or from_encoding is an invalid encoding. Prior to PHP 8.0.0, a E_WARNING was emitted instead.

HLFH commented 3 years ago

This issue is fully fixed by the PR https://github.com/nextcloud/mail/pull/5593.

adarnimrod commented 2 years ago

I think that #5834, #5845, #5998 are duplicate issues and that PR #5882 addresses the same. My issue has been that I stopped seeing new messages. Clearing the cache made it so I didn't see any messages. The error was PHP Fatal error: Uncaught ValueError: mb_convert_encoding(): Argument #3 ($from_encoding) contains invalid encoding \"iso-8859-8-i\" in /var/www/html/custom_apps/mail/vendor/pear-pear.horde.org/Horde_Util/Horde/String.php:160 (both in the web service and in cron.php invocations). I tried the patch in #5593 but that didn't help. Using the same approach (using iconv when mb_convert_encoding fails) in vendor/pear-pear.horde.org/Horde_Util/Horde/String.php fixed it for me. From what I see in the other issues, their error is also in vendor/pear-pear.horde.org/Horde_Util/Horde/String.php (I can't check the logs from Sentry as the link seems to be broken).

ChristophWurst commented 2 years ago

Not merged. Github didn't like my rebase and force push.

Path is

diff --git a/lib/Model/IMAPMessage.php b/lib/Model/IMAPMessage.php
index 95ac594c8..f043cb825 100644
--- a/lib/Model/IMAPMessage.php
+++ b/lib/Model/IMAPMessage.php
@@ -54,6 +54,7 @@ use OCP\AppFramework\Db\DoesNotExistException;
 use OCP\Files\File;
 use OCP\Files\SimpleFS\ISimpleFile;
 use ReturnTypeWillChange;
+use ValueError;
 use function fclose;
 use function in_array;
 use function mb_convert_encoding;
@@ -654,7 +655,11 @@ class IMAPMessage implements IMessage, JsonSerializable {
        // data is utf-8 by default.
        $charset = $p->getContentTypeParameter('charset');
        if ($charset !== null && strtoupper($charset) !== 'UTF-8') {
-           $data = mb_convert_encoding($data, 'UTF-8', $charset);
+           try {
+               $data = mb_convert_encoding($data, 'UTF-8', $charset);
+           } catch (ValueError $exception) {
+               $data = iconv($charset, 'UTF-8', $data);
+           }
        }
        return $data;
    }
small1 commented 1 year ago

This gives nasty error 500 messages from all email from a supplier i have. For some reason they use windows-1257 in emails. All i get is a loading body .

The try catch stuff did not fix it.

brzd commented 1 year ago

Would it be possible to show the mail source (as in 'view source' in mail app) instead of showing nothing, if the character encoding is not available? This way the mail would be at least to some extent readable instead of totally unavailable.

Astinus-Eberhard commented 1 year ago

Please can someone take care of this? For many people from central europe this is really annoying

I created a PR for this: https://github.com/nextcloud/mail/pull/8327