ruby-i18n / i18n

Internationalization (i18n) library for Ruby
MIT License
988 stars 411 forks source link

[BUG] transliterating all-caps strings ends up with mixed case #675

Open padde opened 1 year ago

padde commented 1 year ago

What I tried to do

I want to transliterate an all-caps string

I18n.transliterate("KANÜLE")

What I expected to happen

I expect all resulting characters to be capitalized

#=> "KANUELE"

What actually happened

The resulting characters are mixed case

#=> "KANUeLE"

Simply changing the entries in the translations file to "Ü": "UE" works for this case, but then of course mixed case words will be transliterated in a wrong manner:

I18n.transliterate("Überfall")
#=> "UEberfall"

I would expect a solution that can handle both cases gracefully.

Versions of i18n, rails, and anything else you think is necessary

All versions of i18n

radar commented 1 year ago

Apologies for the delay -- I've been away on leave.

When I run your code I see not quite your expected string, but at least all characters are uppercase:

[1] pry(main)> I18n.transliterate("KANÜLE")
=> "KANULE"
[2] pry(main)> RUBY_VERSION
=> "2.7.6"
[3] pry(main)> I18n::VERSION
=> "1.14.1"

You mention:

Simply changing the entries in the translations file to "Ü": "UE" works for this case,

Which translations file? You did not supply this in your original message.

Could you please supply the file that you're talking about here?

padde commented 1 year ago

@radar my apologies, I am using i18n-rails which includes some transliteration rules for all kinds of languages. The main problem here is that some characters will end up being transliterated as two characters.

Here is a full working example for the first option that we currently have, storing capitalized versions of the transliterated characters, which is what rails-i18n does:

# frozen_string_literal: true

require 'i18n'

I18n.config.enforce_available_locales = false
I18n.locale = :de

# capitalized transliterations, work only for capitalized words
I18n.backend.store_translations(
  :de,
  i18n: {
    transliterate: {
      rule: {
        'ä' => 'ae',
        'é' => 'e',
        'ü' => 'ue',
        'ö' => 'oe',
        'Ä' => 'Ae',
        'Ü' => 'Ue',
        'Ö' => 'Oe',
        'ß' => 'ss',
        'ẞ' => 'SS'
      }
    }
  }
)

puts I18n.transliterate('KANÜLE') # => 'KANUeLE' (bad)
puts I18n.transliterate('FUẞBALL') # => 'FUSSBALL' (good, ẞ is by definition only used for all caps)
puts I18n.transliterate('Überfall') # => 'Ueberfall' (good)

As mentioned before, switching to all-caps versions will not help because then we would break the cases where we actually want capitalized versions such as the last example:

# frozen_string_literal: true

require 'i18n'

I18n.config.enforce_available_locales = false
I18n.locale = :de

# all caps transliterations, work only for all caps words
I18n.backend.store_translations(
  :de,
  i18n: {
    transliterate: {
      rule: {
        'ä' => 'ae',
        'é' => 'e',
        'ü' => 'ue',
        'ö' => 'oe',
        'Ä' => 'AE', # all caps now
        'Ü' => 'UE', # all caps now
        'Ö' => 'OE', # all caps now
        'ß' => 'ss',
        'ẞ' => 'SS'
      }
    }
  }
)

puts I18n.transliterate('KANÜLE') # => 'KANUELE' (good)
puts I18n.transliterate('FUẞBALL') # => 'FUSSBALL' (still good)
puts I18n.transliterate('Überfall') # => 'UEberfall' (bad)
tom-lord commented 8 months ago

'Ü' => 'Ue'

'Ü' => 'UE'

I would expect a solution that can handle both cases gracefully.

My 2 cents on the topic as a passing observer...

Either of your configurations above will be sufficient for the majority of use cases, but they are only approximations. A comprehensive solution cannot be a straightforward "find and replace"; it would need to look at the surrounding context of words.

From the documentation, I18n transliterate rules can be given as a Proc. I don't know what a "perfect" solution for transliterating Ü in German looks like, but for example I found this (JavaScript) code that claims to work for a wider range of scenarios. (You might succeed in finding an even better solution and/or something already written in ruby.)


This library does not, currently, define or maintain transliteration rules across different locales. It simply supports flexible configuration options. Therefore I disagree with the feedback you received in the rails-i18n project: Whilst they may want to keep their "simple" configuration unchanged as it solves the majority of use cases, I still would not consider the raised issue to be a bug in the I18n library, but rather, a configuration issue in your project.