Closed ArchBlood closed 8 months ago
@luke- would it be acceptable for a custom helper class for purifying Html? I think this would be the only workaround for this issue.
I stand corrected, it seems like there is a custom parser already implemented
https://github.com/humhub/translation/blob/master/models/parser/TranslationPurifier.php
Here I've updated the code so that it allows for Asian languages to also translate the special characters on save.
<?php
namespace humhub\modules\translation\models\parser;
use yii\helpers\HtmlPurifier;
class TranslationPurifier extends HtmlPurifier
{
/**
* @inheritDoc
*/
public static function configure($config)
{
// Set HTMLPurifier configuration
$config->set('HTML.Doctype', 'HTML 4.01 Transitional');
$config->set('Attr.EnableID', true);
// Add special characters to the HTMLPurifier's configuration to exclude them from purification
$config->set('HTML.Allowed', 'p,b,i,u,s,a[href],img[src],ul,ol,li,blockquote,code,pre,span,hr,br');
$config->set('Core.EscapeNonASCIICharacters', false);
}
/**
* Replace specific characters after translation for Asian languages.
* @param string $translatedText Translated text.
* @param string $language Language code indicating the target language.
* @return string Text with specific characters replaced for Asian languages.
*/
public static function replaceAsianChars($translatedText, $language)
{
// Define character replacements for Asian languages
$asianReplacements = [
'ja' => [ // Japanese
',' => '、',
'"' => '「」',
'%' => '%',
'~' => '~',
],
'zh' => [ // Chinese (Simplified or Traditional)
// Add character replacements for Chinese if needed
],
'ko' => [ // Korean
// Add character replacements for Korean if needed
],
'th' => [ // Thai
// Add character replacements for Thai if needed
],
'vi' => [ // Vietnamese
// Add character replacements for Vietnamese if needed
],
// Add more languages and their character replacements as needed
];
// Check if the language is one of the supported Asian languages
if (array_key_exists($language, $asianReplacements)) {
// Replace characters for the specific language
$replacements = $asianReplacements[$language];
$translatedText = strtr($translatedText, $replacements);
}
return $translatedText;
}
}
As shown in the screenshot this allows for proper translations when it comes to Asian characters;
@ArchBlood Do I understand correctly that the purifier changes characters like ~
to ~
?
If this is the case, we should modify the purifier instead of implementing a later replacement.
In addition to the purifier, we also have a manual control of the translations (through Activities & Git)
@ArchBlood Do I understand correctly that the purifier changes characters like
~
to~
?If this is the case, we should modify the purifier instead of implementing a later replacement.
In addition to the purifier, we also have a manual control of the translations (through Activities & Git)
Yes, the purifier changes any characters that aren't approved then purifies it and either breaks the translation or doesn't translate at all and shows a warning. With my modification it replaces and allows for the correct translation to be made, there of course are other methods that can be used, but this is a more simple way.
Can we somehow allow all characters in the Purifier?
Can we somehow allow all characters in the Purifier?
I can look into this but overall I've not seen a more simple answer to this issue.
Would be good, but I would rather remove the purifier or replace it with another solution than maintain lists of special characters in different languages :-)
Would be good, but I would rather remove the purifier or replace it with another solution than maintain lists of special characters in different languages :-)
Would you allow for a P/R for this? As I have found a way just adding $config->set('Core.EscapeNonASCIICharacters', false);
Yes, that would be fine!
Currently, the module is wonderful when it comes to translating into other languages, the only issue that I'm seeing so far is symbol characters such as
,
, which in other languages also have their own translation which is purified through this module and doesn't allow for a true and complete translation.Example
,
>、
""
>「」
%
>%
~
>~
These are just some of the symbol characters that are purified through the module which makes translating very difficult.