humhub / translation

Internal translation tool
https://translate.humhub.org
Apache License 2.0
17 stars 15 forks source link

Allow Symbol Character Translations #49

Closed ArchBlood closed 8 months ago

ArchBlood commented 3 years ago

Currently, the module is wonderful when it comes to translating into other languages, the only issue that I'm seeing so far is symbol characters such as ,, which in other languages also have their own translation which is purified through this module and doesn't allow for a true and complete translation.

Example

These are just some of the symbol characters that are purified through the module which makes translating very difficult.

ArchBlood commented 8 months ago

@luke- would it be acceptable for a custom helper class for purifying Html? I think this would be the only workaround for this issue.

ArchBlood commented 8 months ago

I stand corrected, it seems like there is a custom parser already implemented

https://github.com/humhub/translation/blob/master/models/parser/TranslationPurifier.php

Updated Example

Here I've updated the code so that it allows for Asian languages to also translate the special characters on save.

<?php

namespace humhub\modules\translation\models\parser;

use yii\helpers\HtmlPurifier;

class TranslationPurifier extends HtmlPurifier
{
    /**
     * @inheritDoc
     */
    public static function configure($config)
    {
        // Set HTMLPurifier configuration
        $config->set('HTML.Doctype', 'HTML 4.01 Transitional');
        $config->set('Attr.EnableID', true);

        // Add special characters to the HTMLPurifier's configuration to exclude them from purification
        $config->set('HTML.Allowed', 'p,b,i,u,s,a[href],img[src],ul,ol,li,blockquote,code,pre,span,hr,br');
        $config->set('Core.EscapeNonASCIICharacters', false);
    }

    /**
     * Replace specific characters after translation for Asian languages.
     * @param string $translatedText Translated text.
     * @param string $language Language code indicating the target language.
     * @return string Text with specific characters replaced for Asian languages.
     */
    public static function replaceAsianChars($translatedText, $language)
    {
        // Define character replacements for Asian languages
        $asianReplacements = [
            'ja' => [ // Japanese
                ',' => '、',
                '"' => '「」',
                '%' => '%',
                '~' => '~',
            ],
            'zh' => [ // Chinese (Simplified or Traditional)
                // Add character replacements for Chinese if needed
            ],
            'ko' => [ // Korean
                // Add character replacements for Korean if needed
            ],
            'th' => [ // Thai
                // Add character replacements for Thai if needed
            ],
            'vi' => [ // Vietnamese
                // Add character replacements for Vietnamese if needed
            ],
            // Add more languages and their character replacements as needed
        ];

        // Check if the language is one of the supported Asian languages
        if (array_key_exists($language, $asianReplacements)) {
            // Replace characters for the specific language
            $replacements = $asianReplacements[$language];
            $translatedText = strtr($translatedText, $replacements);
        }

        return $translatedText;
    }

}
ArchBlood commented 8 months ago

As shown in the screenshot this allows for proper translations when it comes to Asian characters; Screenshot_1

luke- commented 8 months ago

@ArchBlood Do I understand correctly that the purifier changes characters like to ~?

If this is the case, we should modify the purifier instead of implementing a later replacement.

In addition to the purifier, we also have a manual control of the translations (through Activities & Git)

ArchBlood commented 8 months ago

@ArchBlood Do I understand correctly that the purifier changes characters like to ~?

If this is the case, we should modify the purifier instead of implementing a later replacement.

In addition to the purifier, we also have a manual control of the translations (through Activities & Git)

Yes, the purifier changes any characters that aren't approved then purifies it and either breaks the translation or doesn't translate at all and shows a warning. With my modification it replaces and allows for the correct translation to be made, there of course are other methods that can be used, but this is a more simple way.

luke- commented 8 months ago

Can we somehow allow all characters in the Purifier?

ArchBlood commented 8 months ago

Can we somehow allow all characters in the Purifier?

I can look into this but overall I've not seen a more simple answer to this issue.

luke- commented 8 months ago

Would be good, but I would rather remove the purifier or replace it with another solution than maintain lists of special characters in different languages :-)

ArchBlood commented 8 months ago

Would be good, but I would rather remove the purifier or replace it with another solution than maintain lists of special characters in different languages :-)

Would you allow for a P/R for this? As I have found a way just adding $config->set('Core.EscapeNonASCIICharacters', false);

luke- commented 8 months ago

Yes, that would be fine!