taoensso / tempura

Simple text localization library for Clojure/Script
https://www.taoensso.com/tempura
Eclipse Public License 1.0
260 stars 16 forks source link

Unexpected behaviour of resources vector #32

Closed devurandom closed 1 year ago

devurandom commented 3 years ago

(tr) (tempura 1.2.1) has unexpected behaviour with respect the the resources vector when compared to Dr. Wolfram Schroers' tutorial (linked to from https://github.com/ptaoussanis/tempura#tutorial).

I would expect this behaviour:

(tempura/tr {:dict {:sw {:fallback "?sw" :missing "??sw"}  :en {:fallback "?en" :missing "??en" :test "TEST"}}} [:sw :en] [:test :fallback] [])
;=> "TEST"

(tempura/tr {:dict {:sw {:fallback "?sw" :missing "??sw"}  :en {:fallback "?en" :missing "??en"}}} [:sw :en] [:test :fallback] [])
;=> "?sw"

(tempura/tr {:dict {:sw {:missing "??sw"}  :en {:fallback "?en" :missing "??en"}}} [:sw :en] [:test :fallback] [])
;=> "?en"

(tempura/tr {:dict {:sw {:missing "??sw"}  :en {:missing "??en"}}} [:sw :en] [:test :fallback] [])
;=> nil

But the actual behaviour is:

(tempura/tr {:dict {:sw {:fallback "?sw" :missing "??sw"}  :en {:fallback "?en" :missing "??en" :test "TEST"}}} [:sw :en] [:test :fallback] [])
;=> "?sw" ; << unexpected :fallback, since :en contains :test.

(tempura/tr {:dict {:sw {:fallback "?sw" :missing "??sw"}  :en {:fallback "?en" :missing "??en"}}} [:sw :en] [:test :fallback] [])
;=> "?sw"

(tempura/tr {:dict {:sw {:missing "??sw"}  :en {:fallback "?en" :missing "??en" :test "TEST"}}} [:sw :en] [:test :fallback] [])
;=> "TEST" ; << when :fallback is gone from :sw, the lookup in :en appears to work.

(tempura/tr {:dict {:sw {:missing "??sw"}  :en {:fallback "?en" :missing "??en"}}} [:sw :en] [:test :fallback] [])
;=> "?en"

(tempura/tr {:dict {:sw {:missing "??sw"}  :en {:missing "??en"}}} [:sw :en] [:test :fallback] [])
;=> "??sw"

(tempura/tr {:dict {:sw {}  :en {:missing "??en"}}} [:sw :en] [:test :fallback] [])
;=> "??en"

i.e. as long as the 2nd resources element :fallback actually exists in the :dict for the first language, it will be selected. Only if it does not exist, is the 2nd language considered and the :test key found.

The tempura documentation is ambiguous about what should happen in this case: https://ptaoussanis.github.io/tempura/taoensso.tempura.html#var-tr

(tr
  {,,,}

  ;; Descending-preference locales to try:
  [:fr-FR :en-GB-variation1]

  ;; Descending-preference dictionary resorces to try. May contain a
  ;; final non-keyword fallback:
  [:example/how-are-you? "How are you, %1?"]

  [,,,])

The quick start tutorial (https://github.com/ptaoussanis/tempura#quickstart) is just as ambiguous:

(tr ; Just a functional call
  {:dict my-tempura-dictionary} ; Opts map, see docstring for details
  [:en-GB :fr] ; Vector of descending-preference locales to search
  [:example/foo]) ; Vector of descending-preference resource-ids to search

Dr. Wolfram Schroers (@field-theory) writes very non-ambiguously:

Thus, the call

(tr {:dict translations} [lang :en] [res-key :missing])

will

  1. search for the key res-key in language lang in the translations map.
  2. If it fails to find one, it will look up the key in the :en English language.
  3. If it still fails to find that one, it displays the value of :missing in the lang language map.

Sadly that does not match my observation. (Even though it would be very convenient, if tempura would behave in this way.)

The actual behaviour appears to be rather:

(tr {:dict translations} [lang :en] [res-key :fallback])

  1. search for the key res-key in language lang in the translations map.
  2. If it fails to find one, it will look up :fallback in the same language.
  3. If it also fails to find that, it displays the value of res-key in the :en language map.
  4. If it also fails to find that, it displays the value of :fallback in language :en.
  5. If it also fails to find that, it displays the value of :missing in language :sw.
field-theory commented 2 years ago

I believe the issue is with my tutorial rather than with the behavior of tempura.

Thinking about it the actual behavior may be better than the behavior I had explained: while the assumption that most people would rather read English than an error message in another language may be reasonable, it may not always be what is desired. A different wrapper function for tr (tailored to your needs) may be a better solution than modifying the default behavior of that function.

I will rather fix my tutorial then. I can also add an alternative proposal for the look-up function tr that implements the behavior you describe.

ptaoussanis commented 2 years ago

@devurandom Hi Dennis, thanks for the very clear report! 🙏 @field-theory Likewise, thanks for your thoughts on this!

I haven't had an opportunity yet to properly look into this issue, but will try spend some time tonight. In the meantime, based only on a first quick impression - it indeed sounds like we may want the relevant behaviour to be configurable.

ptaoussanis commented 2 years ago

Okay, I just took a closer look at this and I do believe there might be an issue with the linked tutorial. Caveat: this was still a brief look, and I haven't touched this code in years - so I could be missing something.

Let's start with an example:

(tr
    {:dict
     {:sw {:missing "sw/?" :r1 "sw/r1" :r2 "sw/r2"}
      :en {:missing "en/?" :r1 "en/r1" :r2 "en/r2"}}}

    [:sw :en]
    [:r1 :r2])

The intended (and current) search behaviour is: (or sw/r1 sw/r2 en/r1 en/r2 sw/? sw/? nil).

Note that only the :missing dictionary entries are intended to be error messages. I.e. the first 4 cases in (or sw/r1 sw/r2 en/r1 en/r2 sw/? sw/?) would all be considered successful resource lookups.

The resources in the [:r1 :r2] form aren't intended to be used for error messages, but for possible valid substitutes. So :r2 in [:r1 :r2] isn't really a "fallback" or "error", it's just the 2nd-priority resource.

This generalizes. If you have 3 locales [:l1 :l2 :l3] and 4x resource ids [:r1 :r2 :r3 :r4] then the search behaviour will be:

  (or 
    l1/r1       l1/r2       l1/r3       l1/r4
    l2/r1       l2/r2       l2/r3       l2/r4
    l3/r1       l3/r2       l3/r3       l3/r4
    l1/:missing l2/:missing l3/:missing l4/:missing  ; Automatic lookup of special :missing keys
    nil) ; etc.

The Handling missing keys section of the linked tutorial may be misleading by showing

(tr {:dict translations} [lang :en] [res-key :missing])

The placement of the :missing here is unusual and unnecessary. Tempura will automatically search for the (special) :missing key when none of the provided resource ids can be found in any of the provided locales.

Explicitly adding the :missing key to the resource ids like this indicates to Tempura that :missing is a valid application-level resource id, and so prevents Tempura's usual treatment of :missing as a special error case.

Does this make sense / seem reasonable?

I'll note that the current search order (or sw/r1 sw/r2 en/r1 en/r2 sw/? sw/? nil) could hypothetically instead be (or sw/r1 en/r1 sw/r2 en/r2 sw/? sw/? nil).

I.e. Tempura currently searches locale-breadth-first, but could in principle do resource-breadth-first.

That's originally what I thought you might be asking about. If there's actually demand for this, it should be simple enough to make the search behaviour configurable with an option to tr.

But as I currently understand it, that actually wouldn't be helpful or necessary in your case.

Please do let me know if I've misunderstood something though!

ptaoussanis commented 2 years ago

Note that I've added a further short example to the README in the hope that it may be helpful

ptaoussanis commented 1 year ago

Closing here since I believe the additional example in the README should hopefully be sufficient.

ptaoussanis commented 1 year ago

Update to add: there's now also a Wiki page with some extra documentation here.