Open jhansbo opened 6 months ago
Right, as a Swedish native speaker the current behavior is very strange - if matching without diacritics we get what is the incorrect result really.
Also note that sorting will be incorrect. A, B, C, D, ...., X, Y, Z, Å, Ä, Ö is the correct sorting order for the Swedish alphabet. In a Swedish dictionary é is sorted along with e and ü is sorted along with u (both are true diacritics), but å and ä are not sorted along with a and ö is not sorted along with o.
As mentioned, this is a problem also for Norwegian and Danish. It's peculiar that only Å and Æ are considered diacritics (Danish equivalent of Å and Ä) but Ø is not (Danish equivalent of Ö).
This API range(of:, options:)
isn't locale/language aware. While these letters are distinct letters in Swedish, they are indeed diacritics in other languages, so it's challenging to make that distinction here.
That being said, I would definitely expect the localized version of this API, e.g. range(of: string, options: [.caseInsensitive, .diacriticInsensitive], locale: Locale(languageCode: .swedish))
to return what you described, but it isn't currently. Would you agree that we should track that issue instead?
It seems there are way more languages treating them as separate letters
See e.g.
But as there is no single correct answer, moving this case to be for the Swedish locale would be ok I think. (Although I think the locale-unaware default is debatable, I guess it's been that way for some time...)
The Scandinavian languages and the Finnish language, by contrast, treat the characters with diacritics å, ä, and ö as distinct letters of the alphabet, and sort them after z. Usually ä (a-umlaut) and ö (o-umlaut) [used in Swedish and Finnish] are sorted as equivalent to æ (ash) and ø (o-slash) [used in Danish and Norwegian]. Also, aa, when used as an alternative spelling to å, is sorted as such. Other letters modified by diacritics are treated as variants of the underlying letter, with the exception that ü is frequently sorted as y.
import Foundation
let symbol = "The Swedish letters Å, Ä, Ö" let string = "a" let symbolRange = symbol.range(of: string, options: [.caseInsensitive, .diacriticInsensitive])
if let range = symbolRange { print("Found (string) in (symbol)") } else { print("(string) not found in (symbol)") }
Prints Found 'a' in 'The Swedish letters Å, Ä, Ö'
Should print 'a' not found in 'The Swedish letters Å, Ä, Ö'
Replacing the string with "o" — same issue.