joomla-extensions / jedchecker

Joomla extension to check components, modules or plugins for possible problems for submission to the JED -> Translations: https://joomla.crowdin.com/joomla-official-extensions
38 stars 28 forks source link

JED Checker to report linebreaks in language files #242

Open toivo opened 7 months ago

toivo commented 7 months ago

The latest Joomla updates 4.4.1 and 5.0.1 break extensions if the language strings have linebreaks. The JED Checker is already reporting language strings that start or end with a space character. It should report linebreaks inside the language string so that webmasters and extension developers can fix the linebreaks that stop the translation of the language constants.

Ref. [#42416] - [4.4.1] Language constants of some extensions not translated

toivo commented 7 months ago

Clarification: the linebreaks in question are hard returns, what used to be called carriage return and line feed characters. The HTML linebreak tags
are allowed.

dryabov commented 7 months ago

To be honest, I'm a bit shocked by the latest security patch. I know it is possible to use PHP's constants in string values, but website visitors are not able to manipulate language files, so I don't see the attack vector here.

The difference between normal and raw parsers is not just the support for multiline strings (although it was probably used before, e.g. for initial values in textarea fields, and now developers have to rewrite code just to support the new syntax).

For example, in a normal parser you can specify values in both double and single quotes, but now single quotes will be parsed incorrectly (they will be included in the translated string). Of course, we can warn developers to replace single quotes with double quotes (and don't forget to escape double quotes inside the string).

But the most important thing: before it was possible to use escape sequences, but now only \" is supported. For example, we discussed earlier that a backslash followed by a double quote should be encoded as \\\" according to PHP rules (and I even patched PHP core to handle it properly). And now it doesn't work because in Joomla 5.0.0 it should be \\\" and in 5.0.1 it should be \\". There is no common denominator!!!

Finally, this patch affects performance, because strings were previously loaded as

$strings = parse_ini_file($fileName);

and now

$strings = parse_ini_file($fileName, false, INI_SCANNER_RAW);
$strings = str_replace('\"', '"', $strings);

so, Joomla has to do post-processing on each and every loaded string.

I'll try my best, but there's clearly more to it than just a warning about multiline strings.

toivo commented 7 months ago

Thank you for your comments. Now I understand better what is involved, but it would be brilliant to get it done.

The change in 4.4.1 and 5.0.1 was a total surprise. I have been developing a component and the documentation at Creating a language definition file does not say that the language string is limited to one line and that no cr/lf characters are allowed. I just added the following text there: "Note also that from Joomla 4.4.1 and 5.0.1 each value can only be one line of text. Hard linebreaks invalidate the whole language definition file." Others who know more like you may edit the section about the PHP INI parser.

dryabov commented 6 months ago

The PR #245 comes with all the necessary checks.