Add better i18n support

stephendolan commented 4 years ago

This is relatively ill-defined at the moment, but @paulcsmith will be providing us with some guidance/reference material that will hopefully make a PR possible.

From our roadmap document:

Better I18n. Current implementation works, but is pretty hairy. I (paul) will share some ideas with y’all soon. Will be based on elixir gettext but with a few tweaks to allow different backends (not just PO files)

wout commented 3 years ago

I'd like to help out here. The app I'm building requires very good internationalization support. As already suggested in https://github.com/luckyframework/website/issues/770, I'd like to use crystal-i18n.github.io instead of the shard that's currently proposed.

I've always been happy with internationalization in Rails, and I use it a lot. So if we could achieve something similar in Lucky, based on that shard, that would be great. I'm not a fan of elixir's implementation, though, but please convince me. :smile:

I'd always be happy to create a shard (e.g. lucky_i18n) for those who prefer the Rails' way if that would be a better option. Either way, I'm ready to get started.

jwoertink commented 3 years ago

There's this one too https://github.com/BrucePerens/i18n

wout commented 3 years ago

That's an interesting take. Especially the updated version, which has a slightly better API: https://github.com/BrucePerens/internationalize. But I'm on the fence about using the native language as a key for translations. What happens when you change the string in the native language? Will you need to update the keys for all the translations as well? Because that's an inconvenience and a source for errors. I'm also not a fan of how everything is organised in a flat structure. I like structured files like YAML and JSON because you can group related items together.

Keys referencing a nested structure have an added bonus. They allow you to have an inferred key prefix based on the name of the current class:

class SignUps::New < BrowserAction
  post "/sign_up" do
    plain_text t(".welcome")
  end
end

So the locale key here is sign_ups.new.welcome, which makes it really easy to structure your locales. I find that a really elegant way to organise locales, and it leaves room to define full keys as well. It may be personal preference or just that I'm used to doing things that way, but this setup has served me well for the last decade or so in almost every single app I've built.

Both crystal-i18n/i18n and crimson-knight/i18n.cr allow for the translations to be embedded in the binary, which is useful I think. But even better would be to have a custom backend for loading a NameTuple:

# config/locales/en.cr
{
  en: {
    user: {
      first_name: "Name",
      last_name: "Last name"
    }
  }
}

So we don't even have to parse those YAML or JSON files.

jwoertink commented 3 years ago

I haven't actually looked in to the implementation of any of these shards, but I will say that I worked on a heavy multi language app in rails one time, and it was absolutely a mess. I think if we're going to add I18n support natively in to Lucky, we should think of a way that doesn't conform to the same mess that you get with Rails.

The main issue is dealing with runtime level typos. I'm really bad about spelling a word wrong, and not catching it until late. Using a different language as a key would make that even harder for example appel vs apple; Is the first one a dutch key, or a misspelled english key?

I have no clue how it would work, or if it would even be possible, but what if the interface looked more like

struct BaseLanguage
  Habitat.create do
    # Load these in, and throw a compile-time error if you have some yaml file
    # that doesn't have all the same keys as the rest...
    settings languages : Array(Symbol)
  end
end
struct UserLanguage < BaseLanguage
  language_key :user
  permit_keys first_name, last_name
end

label translate(UserLanguage.first_name)

# compile-time error
label translate(UserLanguage.last_mane)

sort of a half baked idea, but the idea of it focusing on the Lucky core ideals of catching errors in development.

Now, with that said, if we don't focus on the native route, and go more with some "glue" shard that takes one of these and making binding in to Lucky easier, I'm down with that path too. I just wanted to throw in my 2cents while I was thinking about it.

wout commented 3 years ago

Catching translation issues at compile-time is what I'm after too. It would be silly not to have this since it's pretty easy to achieve.

I'm not sure what you mean with:

Using a different language as a key would make that even harder for example

Why would you want to use keys in a different language?

Yesterday I've implemented https://github.com/crystal-i18n/i18n into the project I'm working. It's the most feature-complete internationalization shard I found. It can easily deal with localization and has support for fallbacks. So if you'd have a user with locale en-GB, you can have a en-GB.yml file for exceptions and en.yml for everything that's overlapping with en-US. That's a nice feature to have because then you don't end up with two locale files that are almost identical.

It does everything it should do (and more). But the downside of that shard is that everything happens at runtime, which is kind of a missed chance.

I like the interface you're proposing, but it would also be a lot of extra work defining those additional language classes with allowed keys per domain model. Or am I getting it wrong?

Proposal

What I have in mind is the following. There would be a bunch of locale files (YAML/JSON/Crystal), just like it's done with Rails. Not necessarily one file per language/locale, but you could have a nested folder structure with files organised the way you want it. At compile time, the files are loaded, merged and converted into a Hash with a flat data structure from the language key up. So this:

en:
  user:
    first_name: "First name"
    last_name: "Last name"

Becomes:

module Lucky
  module I18n
    TRANSLATIONS = {
      "en" => {
        "user.first_name" => "First name",
        "user.last_name" => "Last name",
      },
      "es" => {
        "user.first_name" => "Nombre",
        "user.last_name" => "Apellido",
      }
    }
  end
end

This would make it trivial to detect missing keys or typos in keys in the locale files. The flat structure also makes key lookup really fast.

Then we'd use a macro for key lookup. Something along the lines of:

module Lucky
  module I18n
    module Translator
      macro t(key)
        \{%
          translation = Lucky::I18n::TRANSLATIONS[{{key}}]
          raise "Missing key #{{{key}}}" if translation.is_a?(NilLiteral)
          translation
        %}
      end
    end
  end
end

That's a simplified version of course. We'd also need to implement locale fallbacks, interpolation and possibly pluralization. In any case, here we're catching missing translations at compile-time.

The big advantage would be that we get an API that's familiar to a lot of users. And we can still easily add things like key inference, where the prefix of the locale key is based on the class name where the locale is requested:

class SignUps::New < BrowserAction
  post "/sign_up" do
    plain_text t(".welcome") # => resolves to "sign_ups.new.welcome"
  end
end

I'm using this macro right now in the Translator module to generate the key prefix:

# If the given key starts with a ".", generate the prefix based on the current
# class name
private def inferred_key_prefix(key : String) : String
  {% begin %}
    return key unless key.char_at(0) == '.'

    t_prefix || "{{@type.id.underscore.gsub(/::/, ".")}}#{key}"
  {% end %}
end

# In places where a different key prefix is desired define a period-delimited
# string to be used instead of a prefix based on the current class name.
def t_prefix : String?
  nil
end

So, that's what's been brewing in my head. :)

jwoertink commented 3 years ago

Oh, I was just referring to your comment "But I'm on the fence about using the native language as a key for translations." and just agreeing that it would make things difficult if you had the keys in different languages.

I like where your head is at with the sample. I'm glad we're on the same page for the compile-time catches. It just really makes sense to ensure your language templates are all setup instead of deploying to production and randomly finding a "translation key missing" in some corner of your support guides :sweat_smile:

Speaking of ideas... just something else possibly to consider... I believe Rails had a way to add in a sort of string interpolation template. It's been a while since I've used it, but can't you do something like t(".welcome", @name)? Then in the yaml it was like welcome: "Welcome to the site, {name}" :thinking:

wout commented 3 years ago

Oh, I was just referring to your comment

Ah, ok, now I get it. :)

randomly finding a "translation key missing"

Yes, that's what we need to avoid. We've got the tools to do it so we should use them.

I believe Rails had a way to add in a sort of string interpolation

The shard I've been using now has it too: https://crystal-i18n.github.io/translation_lookups.html#interpolations. Basically, that shard is a direct port from the Rails variant. It's written by a guy who works at Shopify. But yeah, interpolation and pluralisation are some requirements.

I'm not sure what would be better. Do we use an existing shard and add a custom loader which implements the lookup at compilation? Or do we build something tailored to Lucky that does the required things, probably with a lot less code (and no additional dependency)?

jwoertink commented 3 years ago

The tough thing is that we try to avoid using external shards within Lucky as a dependency because the second they stop getting updates, we're left to take over it. So building something native that requires one of these other shards might not be the best route.

Maybe to start, if you're down to work on it, just start building out a shard that gets these initial ideas of compile-time checking and being all Lucky centric. Use this i18n shard as a dependency and get things working smooth within an app. Then if it works out well, we can use that as a template to eventually move it in to Lucky, and just replace the core logic of the dependency... or... make it a "backend" that the user just adds their shard??

wout commented 3 years ago

The tough thing is that we try to avoid using external shards within Lucky as a dependency because the second they stop getting updates, we're left to take over it.

That's what I thought, and why I asked. 😄️

make it a "backend" that the user just adds their shard??

That's not a bad idea at all. Not everyone needs internationalization in their app, and for those who don't, it's just dead weight. I can always transfer ownership to luckyframework so it's not locked to my account.

I briefly played with the idea to provide an adaptor, not unlike the carbon adaptors. That way, users would be able to choose their own i18n shard. But it's not like an external email service, those i18n shards more or less all do the same thing and have similar API's. So it doesn't make that much sense.

Ok, I'm going to sketch something out and start using it in my app. Then I will be able to quickly improve on the API and write informative compile error messages. I'll pick up the thread here when I've got something to show.

wout commented 3 years ago

FYI, I'm building this here: https://github.com/wout/rosetta. It's nowhere near a usable product but the foundations are there. It already has key lookup working at compile-time, with basic error messages. It's my intention to make it framework-agnostic.

wout commented 3 years ago

I need your opinion on the current API. The thing is, localizing a string needs to happen in two steps. The lookup of the locale key needs to happen at compile-time. That way the existence of the key can be checked in time. The translation itself needs to happen at runtime because that's when we know which language the user wants.

So what I have now is this:

Rosetta.locale = "es"

class User
  include Rosetta::Translator

  def name_label
    t rosetta("user.name")
  end
end

User.new.name_label
# => "Nombre"

The macro rosetta("user.name") returns a Hash with all the translations (e.g. { "en" => "Name", "es" => "Nombre", "nl" => "Naam" }). Then there is the well-known t method, which in this case will take the Hash and return the value for the currently selected locale.

It's the most elegant solution I could come up with. Another option I considered is:

def name_label
  i18n t("user.name")
end

It's a bit more concise but I'm not sure it's more intuitive.

jwoertink commented 3 years ago

I like that API. I think it reads pretty well. One question is since you have it at the class level, how well do you think it'll handle multiple requests for different languages? I've used that same pattern before and it seemed ok, but I don't think I ever did it on a massive scale. So it was like 2 requests a minute at best lol

wout commented 3 years ago

Thanks! About performance, I don't know. It'll be down to how well Crystal handles those translation hashes. But since the rosetta macro is called at compile-time, doesn't that mean the hash with translations ({ "en" => "Name", "es" => "Nombre", "nl" => "Naam" }) is essentially hard-coded in place? So basically, the t method only does { "en" => "Name", "es" => "Nombre", "nl" => "Naam" }[Rosetta.locale]. Shouldn't that be really fast?

Other I18n shards simply load a massive nested hash into memory and they loop through the nested data structure until the final part of the translation key is reached. With Rosetta, every translation is a one-level hash (at runtime at least).

I've been thinking about using named tuples instead of hashes, but accessing at runtime is a bit more difficult because of the symbol keys. An I don't know how much faster that would be.

Anyway, once I'm a bit further down the road, I'll set up some benchmarks to test performance against a few similar libraries.

jwoertink commented 3 years ago

Oh, I don't mean performance. Sorry, what I meant was using a class level property being more thread safe? I'm not sure if that's the right term in this case, but where a request comes in for spanish which changes the class property, but another request for english comes in and the person wanting english gets the spanish translation because of a race condition where the class property at lookup time was set to spanish... Does that make sense? It might be a non-issue with how threads are handled and stuff, but it was something that came up in my head...

I guess when you get further along and do a benchmark, it might be easier to see. Like loading a 1000 translations in spawns or something :thinking:

wout commented 3 years ago

Ah, yeah, you're right. I need to rework that and add a wrapper for everything that's related to configuration. I was thinking about a Rosetta::Config object that's stored in the current Fiber. What's in the repo right now is still very much proof-of-concept. :smile:

jwoertink commented 3 years ago

Nice. Good work on it! I like the direction you're going.

wout commented 3 years ago

A little update here. I've reworked the API to make it both easier to use and safe for interpolation keys. New usage is:

Rosetta.t("site.title_label").to_s
# => "Title"

If a translated string has an interpolation key but no interpolation key is given, the compiler will raise an error:

# user.welcome_message: "Hi %{name}!"
Rosetta.t("user.welcome_message").to_s

Error: wrong number of arguments for 'Rosetta::Locales::User::WelcomeMessage#with' (given 0, expected 1)

Overloads are:
 - Rosetta::Locales::User::WelcomeMessage#with(name : String)

In this case, you will have to pass the interpolation keys:

Rosetta.t("user.welcome_message").with(name: "Jeremy")

The with method accepts arguments or a NamedTuple. If you need to pass a hash, there is a separate method for that:

Rosetta.t("user.welcome_message").with_hash({ :name => "Jeremy" })

Since the content of hashes can't be restricted, this method is considered unsafe. So if interpolations are missing, you're on your own. But you have to explicitly choose to go this route.

Just a few to-dos now and I'm ready for a first release. :)

jwoertink commented 3 years ago

That's pretty slick!

wout commented 3 years ago

I've worked a lot on the lib and I'm using it in two Lucky apps to iron out the issues. Today I did some benchmarking and it's doing pretty well :smile:. Testing against crystal-i18n, which has a similar feature set, it's more than 10x faster:

crystal-i18n translation   2.24M (446.21ns) (± 4.27%)  48.0B/op  10.82× slower
     Rosetta translation  24.25M ( 41.23ns) (± 6.64%)   0.0B/op        fastest

And more than 20x faster for translations with interpolations:

crystal-i18n interpolation 138.84k (  7.20µs) (± 4.16%)  2.05kB/op  21.23× slower
     Rosetta interpolation   2.95M (339.26ns) (± 7.17%)   80.0B/op        fastest

jwoertink commented 3 years ago

Wow! That's amazing! Great work on it!

wout commented 3 years ago

Another small update before the weekend. Rosetta now has a special integration macro for Lucky:

Rosetta::Lucky.integrate

Calling this macro will include Rosetta::Translatable everywhere it's needed in a Lucky app.

Here's the official documentation:

https://wout.github.io/rosetta/latest/getting_started/#using-lucky

A few more minor releases and we'll have an internationalization lib that's on par with Rails' variant, but with all the benefits of Crystal. :nerd_face:

jwoertink commented 3 years ago

Dude! That's awesome! I dig it. I would be down for updating the docs on the website with this if you wanted to swap those out.

wout commented 3 years ago

Awesome. I would love that! :smiley:

luckyframework / lucky

Add better i18n support #1284

Proposal