Text format for translations

puzrin commented 5 years ago

For user, translations are expected to be in text format:

Can be edited manually
Can be exported/imported to web translation services
Easy to track diffs

C sources will be generated by compiler utility, and to expected to be edited.

Default file type is YAML. Alternative - JSON (will be supported automatically as YAML subset).

Format

TBD

puzrin commented 5 years ago

Format. Something like this:

'en-GB':
  Nut: [ Nut, Nuts ]

'ru-RU':
  Settings: Настройки
  Nut: [ Гайка, Гайки, Гаек ]
  'LVGL is awesome!': ~ # null, added by extractor as mark to translate

Notes:

for default locate plurals are enougth
plural names (zero, ``one, two, few, many, other) are not used, because sequence is strict for each language, and optional (skippable) forms are at the end.

kisvegabor commented 5 years ago

Looks good to me. en_GB and ru_RU should be compiled to C file and array names as I suggested here (I used different naming but doesn't really matter)

puzrin commented 5 years ago

http://docs.translatehouse.org/projects/translate-toolkit/en/latest/formats/index.html - hear is nice list of available translations formats

My suggestion is close to Ruby, by they supports plurals in different way: https://github.com/translate/translate/issues/3615

Still not sure what is better:

Arrays are comfortable for manual edits, when no web translators used.
Named plurals are better for web translators, because standard format should be supported for import/export

Probably we just need to support both and add format option to scripts. Need to dig web translator docs to decide.

puzrin commented 5 years ago

For info: at web projects plurals are ~ 15% of phrases.

Ruby's way:

'en-GB':
  Nut:
    one: Nut
    many: Nuts

'ru-RU':
  Settings: Настройки
  Nut:
    one: Гайка
    few: Гайки
    many: Гаек
  'LVGL is awesome!': ~ # null, added by extractor as mark to translate

May be, not so bad...

kisvegabor commented 5 years ago

Can one, few, many or similar expressions describe complicated plural cases?

Have look at Slovenia here: https://www.gnu.org/software/gettext/manual/html_node/Plural-forms.html

puzrin commented 5 years ago

Can one, few, many or similar expressions describe complicated plural cases?

Yes. See https://github.com/nodeca/plurals-cldr#references

I did plurals codegen for js. It uses Unicode CLDR data to build code & tests automatically. Works with all languages.

We just will use a bit different template to generate C code instead of JS. Not a big deal.

kisvegabor commented 5 years ago

I see, fine!

kisvegabor commented 5 years ago

It seems to we still need a config file where we describe the languages. For example:

en_GB
- number of plurals
- plural rule (one-line rules here )
- and what comes later
ru_RU
- ...

The yaml files for the translations should be created based on the languages declared in this file.

puzrin commented 5 years ago

yaml is self-sufficient. It's created by user and extended by user/extractor. I can't imagine extra info, missed in yaml and required to update is.

number of plurals

plural rule (one-line rules here )

This is NOT for yaml file. This is for generated C source. Different abstraction layer. Info is generated automatically from locale name, CLDR data and so on.

kisvegabor commented 5 years ago

And where will be the mentioned info (number of plurals, plural rules) stored to automatically generated from the local name?

puzrin commented 5 years ago

Generated translations.c can contain any additional data you may wish for proper work.

plural rules functions will be generated from CLDR
plurals/any counters... - this can be added to compiler too, if needed.

I guess, this is more for #7.

puzrin commented 5 years ago

Close - landed to docs

lvgl / lv_i18n

Text format for translations #5

Format