joomla / joomla-cms

Home of the Joomla! Content Management System
https://www.joomla.org
GNU General Public License v2.0
4.73k stars 3.64k forks source link

PHP 8.2 deprecated utf8_decode #38922

Closed nikosdion closed 1 year ago

nikosdion commented 1 year ago

Steps to reproduce the issue

Expected result

No deprecated notices :trollface:

Actual result

(Among other deprecated notices)

Deprecated: Function utf8_decode() is deprecated in .../boot4/libraries/vendor/joomla/uri/src/UriHelper.php on line 36

System information (as much as possible)

PHP 8.2 is the only relevant thing

Additional comments

utf8_encode and utf8_decode are deprecated in PHP 8.2.

You can replace the calls to the deprecated function with calls to the following method:

public static function utf8_decode($s)
    {
        if (version_compare(PHP_VERSION, '8.1.999', 'le'))
        {
            return utf8_decode($s);
        }

        if (function_exists('mb_convert_encoding'))
        {
            return mb_convert_encoding($s, 'ISO-8859-1', 'UTF-8');
        }

        if (class_exists('UConverter'))
        {
            return UConverter::transcode($s, 'ISO-8859-1', 'UTF8');
        }

        if (function_exists('iconv'))
        {
            return iconv('UTF-8', 'ISO-8859-1', $s);
        }

        /**
         * Fallback to the pure PHP implementation from Symfony Polyfill for PHP 7.2
         *
         * @see https://github.com/symfony/polyfill-php72/blob/v1.26.0/Php72.php
         */
        $s = (string) $s;
        $len = \strlen($s);

        for ($i = 0, $j = 0; $i < $len; ++$i, ++$j) {
            switch ($s[$i] & "\xF0") {
                case "\xC0":
                case "\xD0":
                    $c = (\ord($s[$i] & "\x1F") << 6) | \ord($s[++$i] & "\x3F");
                    $s[$j] = $c < 256 ? \chr($c) : '?';
                    break;

                case "\xF0":
                    ++$i;
                // no break

                case "\xE0":
                    $s[$j] = '?';
                    $i += 2;
                    break;

                default:
                    $s[$j] = $s[$i];
            }
        }

        return substr($s, 0, $j);
    }

Tagging @nibra @Hackwar as this is a Joomla Framework issue, @HLeithner @laoneo because this touches architecture and may be happening in other places in the code I have not come across yet.

brianteeman commented 1 year ago

reported and ignored in July #38258

nikosdion commented 1 year ago

@brianteeman I think there is a subtle difference between the two reports. Yours is reporting a deprecation which does not become a major issue for another 2 years with no alternative proposed.

I have a working alternative as I have been closely tracking the PHP 8.2 RFCs and had to address this deprecation in my fork of Joomla's Uri class used in my standalone software.

I would like to believe that having a viable alternative would make it easier to address it. Let's not be defeatists.

HLeithner commented 1 year ago

Do we really assume ISO-8859-1 as fallback? wouldn't ISO-8859-15 fit better? I know that's what utf8_decode does but it also fails to decode some characters because browsers uses windows-1252 as ISO-8859-1 (which of course is wrong). At least that's the warning in the php documentation. Maybe a bad idea.

And do we really need (at first position)

        if (version_compare(PHP_VERSION, '8.1.999', 'le'))
        {
            return utf8_decode($s);
        }

I mean we always should have a mb_convert_encoding and if not fall back to utf8_decode or the own implementation?

Since I'm pretty sure this function doesn't go away (even if php devs wants this) I would move it to https://github.com/joomla-framework/string/blob/2.0-dev/src/StringHelper.php including a second parameter for the target encoding.

nikosdion commented 1 year ago

@HLeithner

Since I'm pretty sure this function doesn't go away (even if php devs wants this)

You are wrong. Please do read the PHP RFC I linked to (https://wiki.php.net/rfc/remove_utf8_decode_and_utf8_encode). It has been voted on in April 2022 after one year of discussion and accepted with 33 votes in favor and only 2 against. PHP RFCs are binding.

Key points from the RFC:

Regarding whether we need to use ISO-8859-1 or ISO-8859-15, please let's read the code from URIHelper. As you can see, utf8_decode is used as a “cheat”:

if (utf8_decode($url) === $url)
{
    return parse_url($url, $component);
}

Basically, what we do here is this. If the entirety of the URL is in ISO-8859-1 (therefore: it's just plain ASCII) we will use parse_url directly. Otherwise we will URL-encode the entire URL except some reserved characters, then put it through parse_url.

Using a different character encoding in this use case would be wrong because of the way parse_url works. For example, having a Euro sign (€) in the URL is valid in ISO-8859-15 but invalid in ISO-8859-1. It has to be encoded for parse_url to work correctly — it assumes an ISO-8859-1 encoding. So, if you used ISO-8859-15 because it's “newer and better” it would actually break the URI parser in Joomla.

That said, Joomla actually requires the mbstring PHP extension. Therefore we can simply replace utf8_decode with mb_convert_encoding.

I would also say that Joomla MUST enforce its minimum requirements in index.php, like it does with the minimum supported PHP version. This would include the mbstring requirement.

HLeithner commented 1 year ago

Sorry I didn't clearly explained my self I mean the usage of this/such function doesn't go away. I know that the utf8_* functions get removed.

Ok but this decode function should not be in the URI class, I think that's the only different left between your proposal and my answer.

nikosdion commented 1 year ago

@HLeithner Thank you for the clarification. Yes, I agree that a replacement method should NOT be in the URIHelper class.

However, I have to note again that Joomla does require mbstring. Since we have that hard requirement does it not make far more sense to use mbstring directly instead of having a helper method?

(The reason I proposed a helper method is that my standalone software can also run on servers with mbstring unavailable, at least far enough so that it shows a nicely formatted error message. This requires me to have access to my fork of the URI class which is why I have this complex replacement method to utf8_decode).

HLeithner commented 1 year ago

true maybe core should do best practice and do the right thing by replacing utf8_decode/encode with mb_convert_encoding.

The only question is does we really require mb_string? I checked https://downloads.joomla.org/technical-requirements which has some garbage requirements (mod_mysql for apache?!) but no requirement for mb string or any other php module/extension. Or do I miss something?

ManuelHu commented 1 year ago

I also do not think that Joomla currently is explicitely requiring mbstring, at least I did not find anything pointing in this direction. Both in

there are some checks about mbstring ini options, but they will just be silently dropped if mbstring is not loaded. Also the requirements page does not list it (and apart from that, the page is partly wrong, as already pointed out...)

On the other hand, comfinder already uses some `mb*` functions without fallback. Also, the Application class in the framework uses it.

Although I never encountered a recent web hosting package without mbstring installed, there are probably some (sadly, mbstring is not part of the PHP extension set that is required to be shipped with PHP...)

So to not break b/c even more, there is only the option of this overcomplicated polyfill function for Joomla 4? And maybe for Joomla 5 requiring a mbstring extension in addition to the already-decided recent PHP version might be reasonable?

nikosdion commented 1 year ago

Joomla DOES require mb_string. Smart Search, Joomla Update, the php-encryption library used in the core, the Crypt and Encrypt core packages (used among other things for Joomla Update, and MFA), the Console package (used for the Joomla CLI), the Joomla String package (used EVERYWHERE in Joomla), the ParagonIE Sodium support (used for logging in!), the WebAuthn library dependencies (used for WebAuthn login and WebAuthn in MFA), as well as several libraries pulled in through composer as dependencies for these and other features require mb_string.

Further to that, the developers of PHP itself are recommending to move away from the old functions such as utf8_decode / utf8_encode and use mb_string instead.

Any server environment which does not have mb_string enabled is run by idiots who last learned about how to configure a PHP server more than 20 years ago when the web was still using 8-bit, per-language encodings. This has not been the case since circa 2003, two years before Joomla 1.0 was released. If someone runs into a server environment which does not have mb_string enabled they should run away as fast as they can, for their hosting company does not understand how PHP works — not just “modern” PHP the last decade, but even PHP 5 from 16 years ago.

laoneo commented 1 year ago

@HLeithner and @nikosdion I appreciate your feedback on https://github.com/joomla/joomla-cms/pull/39583.

HLeithner commented 1 year ago

I checked the source and only found 2 usages in core (not framework) of utf8_decode and 3 of utf8_encode. We have several polyfills in the cms by 3rd party libs for the 2 functions. mb_convert_encoding is only used in smart search and everything other usages seems to have a fallback

Our Joomla\String package has a polyfill for utf8 functions...

My suggestion replace the CMS use cases with Joomla\StringHelper in j4 and make mb_string in j5 mandatory and replace the 5 codeplaces with mb_convert_encoding.

For the framework I would add the polyfill provided by nic in url helper directly and also require mb_string in the next major version.

ymmv

nikosdion commented 1 year ago

@HLeithner I agree with this:

My suggestion replace the CMS use cases with Joomla\StringHelper in j4 and make mb_string in j5 mandatory and replace the 5 codeplaces with mb_convert_encoding.

In my experience, the only servers without mb_string support are those where the PHP extension was accidentally not enabled after a PHP upgrade. Source: my software has required mb_string since 2007.

HLeithner commented 1 year ago

It's about the b/c promise and it's easy to keep this promise here. Introducing a new dependency (even if everyone should have it, I know at least one hoster (which doesn't host joomla of course) that doesn't have mbstring) is a hard break so only allowed in j5 from my point of view.

nikosdion commented 1 year ago

@HLeithner To be fair, mb_string was required from Joomla 1.5 to 3.10. It wasn't listed for Joomla 4 by mistake. But, sure, I don't disagree with this going into Joomla 5.0.

wilsonge commented 1 year ago

@HLeithner To be fair, mb_string was required from Joomla 1.5 to 3.10. It wasn't listed for Joomla 4 by mistake. But, sure, I don't disagree with this going into Joomla 5.0.

I'm not sure that's true - we've been conditionally checking for that extension at least in the installer since 1.6 minimum (sorry too lazy to go back to 1.5 :D) https://github.com/joomla/joomla-cms/blob/83939b8361a49742c17e9775730692b9110274c7/installation/models/setup.php#L155-L170 and certainly if we did require it - we weren't consistent in that judging by that code - which has carried through all the way to 3.x and 4.x

I will add that I'm pretty sure the entire point of us using the phputf8 library in the string package is to polyfill around that. Also in J4 we (admittedly not as an explicit dependency) have the symfony mbstring polyfill included - hence we could start using things in the console package and in the framework - which was an explicit thing we checked back when we did things in finder for j4 https://github.com/joomla/joomla-cms/pull/22042#issuecomment-419807225

Googling around for wordpress/joomla and mbstring - seems from support there's still a good number of recent questions of people who don't have it on their shared hosting - so to me making the symfony mbstring an explicit dependency (given we already have it shipped anyhow) and being done with it seems reasonable. Don't see the need to make things required at the expense of all the tickets for a relatively cheap library to include.

laoneo commented 1 year ago

We have been already loaded the mbstring polyfill, but made it now explicit in https://github.com/joomla/joomla-cms/pull/39583. As you guys can see in the commit 82883720a7cb55be6f16ef8faa856884b3cb4525, the lock file has not really changed.