Open padde opened 1 year ago
Apologies for the delay -- I've been away on leave.
When I run your code I see not quite your expected string, but at least all characters are uppercase:
[1] pry(main)> I18n.transliterate("KANÜLE")
=> "KANULE"
[2] pry(main)> RUBY_VERSION
=> "2.7.6"
[3] pry(main)> I18n::VERSION
=> "1.14.1"
You mention:
Simply changing the entries in the translations file to "Ü": "UE" works for this case,
Which translations file? You did not supply this in your original message.
Could you please supply the file that you're talking about here?
@radar my apologies, I am using i18n-rails which includes some transliteration rules for all kinds of languages. The main problem here is that some characters will end up being transliterated as two characters.
Here is a full working example for the first option that we currently have, storing capitalized versions of the transliterated characters, which is what rails-i18n does:
# frozen_string_literal: true
require 'i18n'
I18n.config.enforce_available_locales = false
I18n.locale = :de
# capitalized transliterations, work only for capitalized words
I18n.backend.store_translations(
:de,
i18n: {
transliterate: {
rule: {
'ä' => 'ae',
'é' => 'e',
'ü' => 'ue',
'ö' => 'oe',
'Ä' => 'Ae',
'Ü' => 'Ue',
'Ö' => 'Oe',
'ß' => 'ss',
'ẞ' => 'SS'
}
}
}
)
puts I18n.transliterate('KANÜLE') # => 'KANUeLE' (bad)
puts I18n.transliterate('FUẞBALL') # => 'FUSSBALL' (good, ẞ is by definition only used for all caps)
puts I18n.transliterate('Überfall') # => 'Ueberfall' (good)
As mentioned before, switching to all-caps versions will not help because then we would break the cases where we actually want capitalized versions such as the last example:
# frozen_string_literal: true
require 'i18n'
I18n.config.enforce_available_locales = false
I18n.locale = :de
# all caps transliterations, work only for all caps words
I18n.backend.store_translations(
:de,
i18n: {
transliterate: {
rule: {
'ä' => 'ae',
'é' => 'e',
'ü' => 'ue',
'ö' => 'oe',
'Ä' => 'AE', # all caps now
'Ü' => 'UE', # all caps now
'Ö' => 'OE', # all caps now
'ß' => 'ss',
'ẞ' => 'SS'
}
}
}
)
puts I18n.transliterate('KANÜLE') # => 'KANUELE' (good)
puts I18n.transliterate('FUẞBALL') # => 'FUSSBALL' (still good)
puts I18n.transliterate('Überfall') # => 'UEberfall' (bad)
'Ü' => 'Ue'
'Ü' => 'UE'
I would expect a solution that can handle both cases gracefully.
My 2 cents on the topic as a passing observer...
Either of your configurations above will be sufficient for the majority of use cases, but they are only approximations. A comprehensive solution cannot be a straightforward "find and replace"; it would need to look at the surrounding context of words.
From the documentation, I18n
transliterate rules can be given as a Proc
. I don't know what a "perfect" solution for transliterating Ü
in German looks like, but for example I found this (JavaScript) code that claims to work for a wider range of scenarios. (You might succeed in finding an even better solution and/or something already written in ruby.)
This library does not, currently, define or maintain transliteration rules across different locales. It simply supports flexible configuration options.
Therefore I disagree with the feedback you received in the rails-i18n
project: Whilst they may want to keep their "simple" configuration unchanged as it solves the majority of use cases, I still would not consider the raised issue to be a bug in the I18n
library, but rather, a configuration issue in your project.
What I tried to do
I want to transliterate an all-caps string
What I expected to happen
I expect all resulting characters to be capitalized
What actually happened
The resulting characters are mixed case
Simply changing the entries in the translations file to
"Ü": "UE"
works for this case, but then of course mixed case words will be transliterated in a wrong manner:I would expect a solution that can handle both cases gracefully.
Versions of i18n, rails, and anything else you think is necessary
All versions of i18n