r18n / r18n-core

I18n tool to translate your Ruby application
GNU Lesser General Public License v3.0
2 stars 4 forks source link

Locale "sr-Latn" is not working: supporting script subtags #10

Open 747 opened 3 years ago

747 commented 3 years ago

In r18n/locale.rb:

https://github.com/r18n/r18n-core/blob/f7bc3003763d51cb92e768b51036e5943ef91e54/lib/r18n-core/locale.rb#L121

This line seems to have an easy logic error, because:

$ irb
irb(main):001:1* module R18n
irb(main):002:2*   class RuFooBarBaz
irb(main):003:1*   end
irb(main):004:1* end
=> nil
irb(main):005:0> l = R18n::RuFooBarBaz.new
=> #<R18n::RuFooBarBaz:0x00007fffdeb58610>
irb(main):006:0> l.class.name.split('::').last.split(/([A-Z][a-z]+)/)[1, 2]
=> ["Ru", ""]
irb(main):007:0> l.class.name.split('::').last.split(/([A-Z][a-z]+)/)
=> ["", "Ru", "", "Foo", "", "Bar", "", "Baz"]

Perhaps the problem has been elusive because it was introduced with the parent locale function (https://github.com/r18n/r18n-core/commit/2c88300c8ae4a5b4b7841ef1ff3c035174c1106c), and no one tried to use different locales under the same parent locale at once.

AlexWayfer commented 3 years ago

Hello. Thank you for your report.

Can you please provide more realistic example? We both know that RuFooBarBaz is out of R18n (real-world projects) scope.

I'd glad to try to understand, cover with tests and fix it.

AlexWayfer commented 3 years ago

To be honest, region was introduced just a day before, as I see: https://github.com/r18n/r18n-core/commit/009cadf039343da3b8653350594084ca3aabe2e9

And there were no reports for 2.5 years, so, I guess, it's not a big deal. 😅

Also we have tests for "different locales (regions) under the same parent locale" here: https://github.com/r18n/r18n-core/blob/28c1d46/spec/r18n_spec.rb#L195-L206

So… I can understand a code error, but I want to know what better to test, how it affects projects.

747 commented 3 years ago

Indeed, now I see most locale classes with a secondary element are named in a format like EnUS so that the behavior is "correct" for them.

What it harms are those such as SrLatn in this repository's built-in locales.

require 'r18n-core'
R18n.set "en-us"
puts R18n.t.yes # => "Yes"
R18n.set "zh-tw"
puts R18n.t.yes # => "是"
R18n.set "sr-latn"
puts R18n.t.yes # => "Yes" <- falls back to English even .yml exists!

open('sr.yml', 'w:utf-8') do |sr|
  sr.puts "'yes': да"
end
open('sr-latn.yml', 'w:utf-8') do |srl|
  srl.puts "'yes': da"
end
R18n.default_places = '.'
R18n.set "sr-latn"
puts R18n.t.yes # => "да"

So maybe no one from Serbia has used this gem 🙄.


And when we're at it, what would you say to supporting script subtags? Outside sr-Latn and sr-Cyrl, there's kk-Latn upcoming, and some real world examples such as zh-Hant-HK (because they may use both Simp. and Trad. variants in Hong Kong) exist.

AlexWayfer commented 1 year ago

It seems a lot more complicated than I thought.

For example: https://en.wikipedia.org/wiki/IETF_language_tag#Extension_U_(Unicode_Locale)

So, "locale" can have a lot of "tags". And the second one can be either region or script or anything else.

Meh.

Two ideas:

  1. We should get rid of script tags and don't support them (do we really need?).
  2. We should implement a complicated system, not breaking existing one with regions, but also supporting script tags on the second and the third places (sr-SR-Latn is possible, I guess?).
747 commented 1 year ago

I think the latter would be a well-balanced option. You should also support 3-letter language codes as in the standard.

(Note that script comes before region, so it must be sr-Latn-SR and not ~sr-SR-Latn~ in that case. And SR is confusingly the country code of Suriname and not Serbia, so "Serbian spoken in Serbia written in Roman alphabet" will be sr-Latn-RS.)

It seems a lot more complicated than I thought.

For example: https://en.wikipedia.org/wiki/IETF_language_tag#Extension_U_(Unicode_Locale)

The whole system of IETF language tag is indeed complex, but half of them (including what you cited) are for domain-specific or backward compatibility things not immediately needed for user-facing locales.

Almost all cases can be covered with three elements: language-script-region. If you want a step smarter thing with relatively small effort, consider also accepting one variant in the place of region (so that language-script-variant). This is good for sub-country official languages such as Scottish English en-scotland or Valencian ca-valencia (because IETF tags are not designed to handle ISO subdivision codes very well).