taoensso / tower

i18n & L10n library for Clojure/Script
https://www.taoensso.com/tower
Eclipse Public License 1.0
277 stars 24 forks source link

Script support for languages #65

Open juhovh opened 9 years ago

juhovh commented 9 years ago

I'm writing this on my mobile phone so sorry if I'm describing this too shortly.

On JVM languages the language tags are considered to be in format language-region-variant. This is valid, but makes it impossible to select translation based on script, which is kind of a deal breaker for me.

To give an example, we could use Serbian language. There are versions sr-Latn-RS (Serbian with latin script in Serbia), sr-Cyrl-RS (Serbian with cyrillic script in Serbia), sr-Latn-MN (Serbian with latin script in Montenegro), sr-Cyrl-MN (Serbian with cyrillic script in Montenegro). There is currently no way of providing script information to the library.

Another example is Chinese, where zh-Hans (simplified Chinese) is used in mainland and Singapore, whereas zh-Hant is used in Taiwan and Hong Kong. It would be really useful to have generic localizations for simplified and traditional Chinese and some country specific localizations for example for Taiwan and Singapore.

Now there are simple fixes and more complicated fixes, the most simple would be to require the language tags to be in form language-script-region-variant, but that would break backwards compatibility. The more complicated version would require regexps and/or parsing etc. Which version you would like to have a pull request of?

juhovh commented 9 years ago

After thinking this through I think going with a proper parsing of BCP 47 tags (probably still supporting underscore for backwards compatibility?) is the only reasonable option. So I'll start to prepare a patch for that.

ptaoussanis commented 9 years ago

Hi there,

I'm afraid I'm not really following. To clarify: Tower has two notions of "locale"-

  1. A JVM locale used for the JVM-dependent localization features
  2. A keyword locale used for translations

The second is a strict superset of the first. No JVM validation or structural requirements are imposed on the second type.

So you can use any kind of locale structure you like for translations: :sr-Latn-RS and :zh-Hans should work fine.

You can't use these for anything that requires a JVM locale (since the JVM wouldn't be able to recognize these); but you're free to use them for the translations API (which is pure Clojure/Script).

The only semantic requirement is that an :x-y-z locale can sensibly fallback to :x-y and then :x.

Does that make sense?

juhovh commented 9 years ago

Hi,

Sorry for not being clear enough in my description, I'll try to be short and describe my use case better.

I have a system where I have a valid IETF BCP 47 locale in my database for each user. I want to use this locale for formatting both translations and JVM-dependent localization features. If we take zh-Hans-CN as an example, I can use it just fine for translations, but if I try to use it for localization features the following happens:

=> (tower/fmt-str :zh-Hans-CN "%f" 5.5)
ExceptionInfo Invalid locale: :zh-Hans-CN  clojure.core/ex-info (core.clj:4403)

This is not nice, because zh-Hans-CN is a perfectly valid locale according to best current practices. (hence BCP)

What I would really like to have as a first step would be something like the following:

diff --git a/src/taoensso/tower.cljx b/src/taoensso/tower.cljx
index 2abb09d..933decd 100644
--- a/src/taoensso/tower.cljx
+++ b/src/taoensso/tower.cljx
@@ -7,7 +7,7 @@
                   [taoensso.encore :as encore]
                   [taoensso.timbre :as timbre]
                   [taoensso.tower.utils :as utils :refer (defmem- defmem-*)])
-  #+clj (:import  [java.util Date Locale TimeZone Formatter]
+  #+clj (:import  [java.util Date Locale Locale$Builder TimeZone Formatter IllformedLocaleException]
                   [java.text Collator NumberFormat DateFormat])
   #+cljs (:require-macros [taoensso.encore :as encore]
                           [taoensso.tower  :as tower-macros])
@@ -67,11 +67,16 @@
         (make-Locale (.getLanguage ^Locale loc)))

       :else
-      (let [loc-parts (str/split (name loc) #"[-_]")]
-        (all-Locales
-          (if-not lang-only?
-            (apply make-Locale loc-parts)
-            (make-Locale (first loc-parts))))))))
+      (try
+        (let [loc-obj (.build (.setLanguageTag (Locale$Builder.) (name loc)))]
+          (if-not lang-only? loc-obj
+            (make-Locale (.getLanguage ^Locale loc-obj))))
+        (catch IllformedLocaleException e
+          (let [loc-parts (str/split (name loc) #"[_]")]
+            (all-Locales
+              (if-not lang-only?
+                (apply make-Locale loc-parts)
+                (make-Locale (first loc-parts))))))))))

 #+clj
 (def jvm-locale

This would support all BCP 47 locales and try to fall back to the old legacy way of handling locales. Notice that I removed - from the regex because I think it would be much nicer to have all locale names separated by dash to be well formed BCP 47 names.

What would be nice in the long run would be something like what .NET does as explained in https://msdn.microsoft.com/en-us/library/vstudio/dd997383(v=vs.100).aspx

The parent chain of the Chinese cultures now includes the root Chinese culture. The following examples show the complete parent chain for two of the Chinese specific cultures:

zh-CN → zh-CHS → zh-Hans → zh → Invariant
zh-TW → zh-CHT → zh-Hant → zh → Invariant

So the system should know that simplified Chinese is used in mainland China and fall back to zh-Hans if zh-CN is used. But it seems that the JVM doesn't support this either and leaves it to the implementation to handle correct mappings, so I think it's out of scope of tower.

Did this clarify my point of script support any better?

ptaoussanis commented 9 years ago

Hi,

Did this clarify my point of script support any better?

It did, thank you for the detailed info! Not ready to reply just yet, need some time to go over this all. Just leaving a reference here in the meantime: http://openjdk.java.net/jeps/128