themoeway / yomitan

Japanese pop-up dictionary browser extension. Successor to Yomichan.
https://chromewebstore.google.com/detail/yomitan/likgccmbimhjbgkjambclfkhldnlhbnn
GNU General Public License v3.0
989 stars 76 forks source link

Yomitan internationalization and localization #589

Open Casheeew opened 6 months ago

Casheeew commented 6 months ago

With the recent changes (i.e more languages), it is a good idea now to look into adding localization for more languages. I am happy to help with the l10n of my native language (viet) or possibly japanese.

StefanVukovic99 commented 6 months ago

I had started work on this over at Yezichak, though my approach is fairly basic:

I used an "i18n" attribute in the html

<div class="settings-item"><div class="settings-item-inner settings-item-inner-wrappable">
    <div class="settings-item-left">
        <div class="settings-item-label" i18n="settings.language.locale.label">
            Locale
        </div>
        <div class="settings-item-description" i18n="settings.language.locale.description">
            The language the interface will be displayed in
        </div>
    </div>
    <div class="settings-item-right">
        <select 
            id="locale-select"
            data-setting="general.locale"
        ></select>
    </div>
</div></div>

Each language can have its i18n.json:

"settings": {
    "profile":{
        "heading": "Profil",
        "default": {
            "label": "Glavni profil",
            "description": "Izaberite glavni profil koji se koristi za skeniranje."
        },
        "editing": {
            "label": "Uređivanje profila",
            "description": "Promijenite koji se profil uređuje na ovoj stranici."
        },
        "configure": "Podesite profile\u2026"
    },
    "language":{
        "heading": "Jezik",
        "language":{
            "label": "Jezik",
            "description": "Jezik koji čitate ovim profilom."
        },
        "locale":{
            "label": "Jezik interfejsa",
            "description": "Jezik na kojem je prikazana aplikacija"
        }
    }
    ...
}

localization.js:

_translateAll() {
    const translatables = document.querySelectorAll('[i18n], [i18n-title]');
    translatables.forEach((element) => {
        this._translateElement(element);
    });
}

_translateElement(element) {
    const key = element.getAttribute("i18n");
    const title = element.getAttribute("i18n-title");
    if(key){
        const translation = this.getDeep(this._translations, key);
        element.innerText = translation || element.innerText;
    }
    if(title){
        const translation = this.getDeep(this._translations, title);
        element.setAttribute("title", translation || element.getAttribute("title"));
    }
}

getDeep (object, path, defaultValue = null) {
    return path
        .split('.')
        .reduce((o, p) => o ? o[p] : defaultValue, object)
}

localization.js should likely also export a function to get translations for dynamic text.

StefanVukovic99 commented 6 months ago

If there isn't a better way, maybe some library, I can make a PR

toasted-nutbread commented 6 months ago

I'd probably suggest using data- prefixed attributes since they are a standard, otherwise something like what @StefanVukovic99 mentioned looks fine for an initial pass. It will likely be more complicated than that, given the amount of dynamically created HTML.

djahandarie commented 6 months ago

Since we never really discussed this let me add my input here with my cost-benefit analysis.

I think full i18n support is actually a much larger undertaking for the project (compared to supporting more languages for lookups/dictionaries), because the i18n burden is essentially on the developers/maintainers/everyone who makes a PR introducing making any change to any user-facing string anywhere, while lookup/dictionary support is more just on the dictionary makers (as long as we have the overall structure to support it by doing some initial generalizations).

That's about the cost; on the benefit side, there are billions of people out there trying to learn languages other than Japanese, which is why it makes sense to support more lookup languages, but I feel like there are fewer people (out of those on the internet installing Chrome extensions anyways) who understand so little English that they can't even figure out how to use Yomitan -- if there are, I think just a basic guide in their language would be a better tradeoff than maintaining full i18n over time. Esp with advances in machine translation I think most people can get to a basic understanding without us needing to take on huge maintenance burden.

That said, it could just be my bias being a speaker of English. Please feel free to make a counter argument. But that's where I stand on this issue atm.

StefanVukovic99 commented 6 months ago

In no particular order:

English is by far the most popular language for learning. Granted, it's not as horrible as a beginner learner of Japanese having to use Japanese software, but the English interface is still something of a barrier.

There have been a few cases on Yezichak where someone installed it for a relative, such cases would benefit from this.

How do they manage localization on Anki? Would be nice for Anki users to have Yomitan in the same language.

Not sure what exists in the way of dev dependencies that could help with warnings/syncing files.

I think helping out with localization is one of the simplest ways to contribute and a nice way to put one's toe in the water for folks who want to start helping out.

Feels weird not having a Japanese interface option.

Locales can be rolled out gradually as their maintainers pop up.

A language need not have 100% of the strings covered at all times. Adding new strings or making small changes can at worst cause an English string to pop out. Changes to the interface don't seem that common in any case. The lag between testing and stable can allow locale maintainers to catch up. I think the burden is mostly on localizers rather than on contributors of general features.

toasted-nutbread commented 6 months ago

Something that would probably help greatly is some way of automating detection of which parts of the application do vs don't have translations. Not sure immediately how this would be gone about for all situations, but it would make it more evident where places may need localizations. I agree that a lot of the burden shouldn't be on active developers but rather any localizers, and falling back to English as the default should keep the current workflow basically the same.

Casheeew commented 6 months ago

The first refactor towards i18n is putting all user-visible strings into a json file: See https://developer.chrome.com/docs/extensions/reference/api/i18n#concepts_and_usage. Then the work of localizing is just translating a json file, where all the messages are in one place, which doesn't burden developers that heavily.

On the second point, I personally know a few people who is using yomichan without a good command of English. I have also recommended the extension to relatives who dont speak English. Yomitan is not a simple extension. It is easy to get it working with just the default settings, but it would non-trivial for to figure out the advanced settings that Yomitan offers.

Also, it has the nice added benefit of adding to the authenticity feel and immersive experience (someone might not want to mix English UI with all of their J-J dictionaries, also immersion :100: )

djahandarie commented 5 months ago

Thanks for the extra discussion!

I think if there's sufficient tooling surrounding the i18n files it might work. I think the minimum I would want, is some sort of coloring/linting for each i18n file, which shows when it has drifted from the main English file. E.g., it's missing a new entry, or has not updated an existing entry to match an update in the English. That way it's easy for localizers to see what work is remaining on a given i18n file, and also easy for me as a maintainer to see when we need to find someone new to help update a given i18n file (or decide to just eliminate it because it has gone dangerously out of sync etc).

I'm not sure how one could implement that though. You might need to version every single string.

(I also considered that you could implicitly version them via running git blame each i18n file, and then require that every single line in non-English i18n files must have an edit timestamp past the respective line in the English i18n file. However, that would break on some edge cases, such that if someone makes a non-semantic change to the English i18n (such as fixing a minor typo or punctuation error) that doesn't actually require an update in the other i18n files... people would need some way to mark their commits as minor if we did it like this I guess :thinking:)

Is there any existing tooling out there that deals with managing lots of i18n files and tracking/visualizing drift between them etc?