ruby-i18n / i18n

Internationalization (i18n) library for Ruby
MIT License
976 stars 408 forks source link

[BUG] transliterating all-caps strings ends up with mixed case #675

Open padde opened 1 year ago

padde commented 1 year ago

What I tried to do

I want to transliterate an all-caps string

I18n.transliterate("KANÜLE")

What I expected to happen

I expect all resulting characters to be capitalized

#=> "KANUELE"

What actually happened

The resulting characters are mixed case

#=> "KANUeLE"

Simply changing the entries in the translations file to "Ü": "UE" works for this case, but then of course mixed case words will be transliterated in a wrong manner:

I18n.transliterate("Überfall")
#=> "UEberfall"

I would expect a solution that can handle both cases gracefully.

Versions of i18n, rails, and anything else you think is necessary

All versions of i18n

radar commented 12 months ago

Apologies for the delay -- I've been away on leave.

When I run your code I see not quite your expected string, but at least all characters are uppercase:

[1] pry(main)> I18n.transliterate("KANÜLE")
=> "KANULE"
[2] pry(main)> RUBY_VERSION
=> "2.7.6"
[3] pry(main)> I18n::VERSION
=> "1.14.1"

You mention:

Simply changing the entries in the translations file to "Ü": "UE" works for this case,

Which translations file? You did not supply this in your original message.

Could you please supply the file that you're talking about here?

padde commented 11 months ago

@radar my apologies, I am using i18n-rails which includes some transliteration rules for all kinds of languages. The main problem here is that some characters will end up being transliterated as two characters.

Here is a full working example for the first option that we currently have, storing capitalized versions of the transliterated characters, which is what rails-i18n does:

# frozen_string_literal: true

require 'i18n'

I18n.config.enforce_available_locales = false
I18n.locale = :de

# capitalized transliterations, work only for capitalized words
I18n.backend.store_translations(
  :de,
  i18n: {
    transliterate: {
      rule: {
        'ä' => 'ae',
        'é' => 'e',
        'ü' => 'ue',
        'ö' => 'oe',
        'Ä' => 'Ae',
        'Ü' => 'Ue',
        'Ö' => 'Oe',
        'ß' => 'ss',
        'ẞ' => 'SS'
      }
    }
  }
)

puts I18n.transliterate('KANÜLE') # => 'KANUeLE' (bad)
puts I18n.transliterate('FUẞBALL') # => 'FUSSBALL' (good, ẞ is by definition only used for all caps)
puts I18n.transliterate('Überfall') # => 'Ueberfall' (good)

As mentioned before, switching to all-caps versions will not help because then we would break the cases where we actually want capitalized versions such as the last example:

# frozen_string_literal: true

require 'i18n'

I18n.config.enforce_available_locales = false
I18n.locale = :de

# all caps transliterations, work only for all caps words
I18n.backend.store_translations(
  :de,
  i18n: {
    transliterate: {
      rule: {
        'ä' => 'ae',
        'é' => 'e',
        'ü' => 'ue',
        'ö' => 'oe',
        'Ä' => 'AE', # all caps now
        'Ü' => 'UE', # all caps now
        'Ö' => 'OE', # all caps now
        'ß' => 'ss',
        'ẞ' => 'SS'
      }
    }
  }
)

puts I18n.transliterate('KANÜLE') # => 'KANUELE' (good)
puts I18n.transliterate('FUẞBALL') # => 'FUSSBALL' (still good)
puts I18n.transliterate('Überfall') # => 'UEberfall' (bad)
tom-lord commented 5 months ago

'Ü' => 'Ue'

'Ü' => 'UE'

I would expect a solution that can handle both cases gracefully.

My 2 cents on the topic as a passing observer...

Either of your configurations above will be sufficient for the majority of use cases, but they are only approximations. A comprehensive solution cannot be a straightforward "find and replace"; it would need to look at the surrounding context of words.

From the documentation, I18n transliterate rules can be given as a Proc. I don't know what a "perfect" solution for transliterating Ü in German looks like, but for example I found this (JavaScript) code that claims to work for a wider range of scenarios. (You might succeed in finding an even better solution and/or something already written in ruby.)


This library does not, currently, define or maintain transliteration rules across different locales. It simply supports flexible configuration options. Therefore I disagree with the feedback you received in the rails-i18n project: Whilst they may want to keep their "simple" configuration unchanged as it solves the majority of use cases, I still would not consider the raised issue to be a bug in the I18n library, but rather, a configuration issue in your project.