Adding Translations in v2

I was investigating a solution to add all translations from SpaceCadetPinball.rc of the v1 to v2.

So far, I wrote a perl script to generate .cpp/.h files from the .rc file. The choice of using .cpp files instead of something not hardcoded (like .ini) is purely to keep the release binary self contained (the same way the .rc file in v1 is integrated in the .exe after compilation), and to be able to use macros (like IDS_STRINGxxx).

I've then got to the issue of drawing text (which use system fonts when not using the English language). I found that imgui has already everything to draw text, but the coordinate system is not the same as the one used for the game rendering (menus are never scaled, but the game itself is scaled).

So I've come up to several questions and solutions:

About the translations itself, what do you thing about embedding them in the .exe vs putting them aside in .ini files ? (These .ini file would contains lines like 123=Text).
About the drawing of text:
1. Either use imgui, but getting the coordinate system right would need maths and will be probably a bit complex to get right
2. Or use SDL_ttf to draw the text directly on the render::vscreen texture (the one also used for the game rendering), so without need to convert coordinate, but that will add a new dependency on SpaceCadetPinball.

I will try in the mean time to get imgui coord conversion working (I don't know if SDL_ttf is large or not).

I thought about this topic, here are my plans:

1) I want to embed all texts into the executable, as it is done now. My plan was to use Id-less approach - English text as key in a map or a custom data structure. One to many, like this:“Eng text”={“Lang 1”, “Lang 2”…} The translated text could be in order or out of order keyed, whatever makes more sense.

2) I was going to use ImGui for both menu texts and in game texts. ImGui can build atlas from specific TTF ranges, based on selected language. For fixed texts, atlas can be trimmed of unused characters. Current scaling uses old ImGui API, that should be replaced with separate font/GUI scaling. PS Vita port has some code for this. I planned to keep the same old ImGui version, but updating it is ok if that brings critical improvements for this task.

For fonts, I wanted to find and embed a Unicode capable free vector font, like ProggyClean.ttf that is embedded now. This is to maintain a uniform look across platforms. There can be an option to use user-provided system font.

Rendering to vscreen appears easier, but that leads to worse results due to its low resolution. It would hinder font AA, high DPI, etc. Original game uses winapi to draw translated text directly to window, I wanted to do the same with ImGui overlay. In game texts will probably need separate atlas with different font scale. Or maybe even a different font.

In my timetable, this task has lower priority than adding missing FT features. But you are of course welcome to try right away, I will help.

Ok thanks.

I've managed to get something working with ImGui using the default font:

The matching code is there: https://github.com/k4zmu2a/SpaceCadetPinball/compare/master...amurzeau:SpaceCadetPinball:add-translations

When using the English text as a key, isn't there any risk of missing translation if the english text get changed ? Right now, I've using integers as the key (the same way as currently done in pinball.cpp), adding other translations was easy as I just needed to use the defines. For now, I've not yet removed all the "&" from menu translations, the & character is removed in the get_rc_string function instead. But this is probably better to directly change the translations instead.

ImGui is indeed the way to go as you said, I didn't think about the scaling that would make the font ugly. There is a bit of maybe complex code to store sidebar messages in a map, and then render them with ImGui when appropriate. I need the map as the game only call grtext_draw_ttext_in_box when a new message must be drawn (and not on every frames).

I will continue toward something in line with what you said. I know that some characters doesn't display properly with the current code (like Chinese characters). As there are 24 languages, I fear that the ld-less approach will be hard to generate from existing code, but will try.

That is some nice progress)

I am not 100% attached to fragile string keys. I am ok with Ids if we better utilize their potential: • Ids should be flattened; text map turns into array. • Ids should use enum class, no macros allowed. • Ideally, all texts should be defined in the same place: Id – many texts. For example: Id – array of texts, where lang id is flat index.

I agree, "&" should be stripped from source texts, alt navigation is a niche feature that ImGui probably does not support.

For in game texts draw list is the way to go. You could simplify it – there should be only one grtext call per textbox. Textboxes could have unique index (assigned at construction from static counter) for gdrv array.

In my opinion, language should be selected from the list in options, with English by default. There should be no language auto detect.

All coordinate scaling maths should go into fullscrn.

Hi,

I've implemented this:

Remove "&" from texts
Use enum instead of defines
Use a menu and a new option to select the language instead of autodetection
Move scaling maths to fullscrn

Remaining:

Change the translated text layout to have all translations in the same place for each ID

https://github.com/k4zmu2a/SpaceCadetPinball/compare/master...amurzeau:SpaceCadetPinball:add-translations

If I understand correctly, what you need is to have easy maintenance of all translation at once, meaning that all translation for one ID should be in the same place. Then the solution is whatever has low maintenance costs ?

I've though about the possible options, and here are the propositions:

#include <string>
#include <unordered_map> // Or map
#include <array>
#include <vector>
#include <stdint.h>
#include <assert.h>

// These enum don't need to really exist in the final code,
// but are written here for readability of below propositions.
enum language_e
{
    LANGUAGE_EN = 0,
    LANGUAGE_FR,
    LANGUAGE_ES,
    LANGUAGE_NUMBER,
};

enum string_index_e
{
    IDS_Game,
    IDS_Player,

    IDS_NUMBER,
};

/////////////////////////////////////////////////////////////////////////////////
// Proposition 1: keep IDs and a map:
//  - Maybe more performance than strings when hashing for the unordered_map
//  - Same as v1 way to handle translations (using IDs)
/////////////////////////////////////////////////////////////////////////////////

std::unordered_map<uint32_t, std::array<const char*, LANGUAGE_NUMBER>> translations_using_enum =
{
    {
        IDS_Game,
        {
            "Game EN",
            "Jeu FR",
            "Juego ES",
        },
    },
    {
        IDS_Player,
        {
            "Player EN",
            "Joueur FR",
            "Jugador ES",
        },
    },
};

const char* get_translation_by_id(language_e lang, string_index_e string_idx)
{
    assert(lang < LANGUAGE_NUMBER);
    assert(string_idx < IDS_NUMBER);

    auto it = translations_using_enum.find(string_idx);
    if(it == translations_using_enum.end())
        return nullptr; // string_idx doesn't exist, we could assert here instead

    return it->second[lang];
}

/////////////////////////////////////////////////////////////////////////////////
// Proposition 2: keep IDs and a flat array:
//  - Best performance
//  - Hard to ensure enum string_index_e values match the translation effective
//    index in the array
/////////////////////////////////////////////////////////////////////////////////

std::array<std::array<const char*, LANGUAGE_NUMBER>, IDS_NUMBER> translations_using_enum_flat =
{{
    {
        "Game EN",
        "Jeu FR",
        "Juego ES",
    },
    {
        "Player EN",
        "Joueur FR",
        "Jugador ES",
    },
}};

const char* get_translation_by_id_flat(language_e lang, string_index_e string_idx)
{
    assert(lang < LANGUAGE_NUMBER);
    assert(string_idx < IDS_NUMBER);

    const auto& translations = translations_using_enum_flat[string_idx];

    return translations[lang];
}

/////////////////////////////////////////////////////////////////////////////////
// Proposition 3: use english string as key:
//  - Make reading the code better
/////////////////////////////////////////////////////////////////////////////////

std::unordered_map<std::string, std::array<const char*, LANGUAGE_NUMBER-1>> translations_using_en_string =
{
    {
        "Game EN",
        {
            "Jeu FR",
            "Juego ES",
        },
    },
    {
        "Player EN",
        {
            "Joueur FR",
            "Jugador ES",
        },
    },
};

const char* get_translation_by_str(language_e lang, const char* english_string)
{
    assert(lang < LANGUAGE_NUMBER);

    if(lang == LANGUAGE_EN)
        return english_string;

    auto it = translations_using_en_string.find(std::string(english_string));
    if(it == translations_using_en_string.end())
        return english_string; // Return english string by default

    return it->second[lang - 1];
}

/////////////////////////////////////////////////////////////////////////////////
// How the code that use a string will look like
/////////////////////////////////////////////////////////////////////////////////

// get_rc_string functions, assuming we are using FR language
const char* get_rc_string_by_id(string_index_e string_idx)
{
    return get_translation_by_id(LANGUAGE_FR, string_idx);
}

const char* get_rc_string_by_id_flat(string_index_e string_idx)
{
    return get_translation_by_id_flat(LANGUAGE_FR, string_idx);
}

const char* get_rc_string_by_str(const char* english_string)
{
    return get_translation_by_str(LANGUAGE_FR, english_string);
}
// Low character-count translation ( like Qt's tr() )
const char* tr(const char* english_string)
{
    return get_rc_string_by_str(english_string);
}

void function(const char* arg, int a1, int a2) {}

void test()
{
    const char* text1 = get_rc_string_by_id(IDS_Game);
    const char* text2 = get_rc_string_by_id_flat(IDS_Game);

    // Here, we need to be careful if changing the english string if we need
    // to keep translations. An assert on untranslated strings can help catching
    // missing translations.
    const char* text3 = get_rc_string_by_str("Game EN");
    const char* text4 = tr("Game EN");

    // Example with a function call
    function(get_rc_string_by_id(IDS_Game), 1024, 565);
    function(get_rc_string_by_id_flat(IDS_Game), 1024, 565);

    function(get_rc_string_by_str("Game EN"), 1024, 565);
    function(tr("Game EN"), 1024, 565);
}

I think that proposition 2 with flat array is not a good idea, as the maintenance of it will be harder with a higher risk of index / enum mismatch. Due to this, I think that if we are going with maps, we are probably better of with english strings as the ID (proposition 3) as you initially proposed.

In my opinion, maps are not way to go with indexable Ids. But having explicit Ids and out of order initialization is nice. Luckily, C++ allows for best of both worlds, at compile time. Here is what I had in mind:

enum class Msg :int
{
    Min = 0,
    Id0 = 0,
    Id1,
    Max
};

enum class Language :int
{
    Min = 0,
    Lang0 = 0,
    Lang1,
    Max
};

// Step 1, the concept
struct X
{
    constexpr X(std::initializer_list<std::pair<const Msg, LPCSTR>> iList)
    {
        for (const auto pair : iList)
        {
            Store[static_cast<int>(pair.first)] = pair.second;
        }
    }

private:
    LPCSTR Store[5] = {nullptr};
};

constexpr X XActual =
{
    {Msg::Id0, u8"First"},
    {Msg::Id1, u8"Second"},
};

// Step 2, final nested class
template <typename Key, typename Value, int N>
struct Y
{
    constexpr Y(std::initializer_list<std::pair<Key, Value>> iList)
    {
        for (const auto pair : iList)
        {
            Store[static_cast<int>(pair.first)] = pair.second;
        }
    }

    constexpr Y(): Store()
    {
    }

private:
    Value Store[N]{};
};

constexpr Y<Msg, Y<Language, LPCSTR, (int)Language::Max>, (int)Msg::Max> YActual =
{
    {
        Msg::Id0, {{Language::Lang0, u8"First"}, {Language::Lang1, u8"Second"}}
    },
};

Key features: • Ids in enum class, not just enum. • All info stored in fixed arrays. • Compile time key out of bounds check, more can be added at runtime. • Note u8 on string literals, ImGui only supports UTF8. • This should be way faster and smaller that string keys.

Edit 1: As it turns out, constexpr for loop over initializer_list is a C++14 feature (thanks to VS for ignoring CMAKE_CXX_STANDARD) . My initial goal was to keep C++11, but that is not 100% required in v2. Lets decide what to use (11 no constexpr or 14 with constexpr) based on binary size and startup performance.

k4zmu2a / SpaceCadetPinball

Adding Translations in v2 #152