LazyLoadable Backend - Githubissues

paarthmadan commented 2 years ago

What's in this PR

This PR introduces a new LazyLoadable backend following the proposal written in https://github.com/ruby-i18n/i18n/issues/592.

What does the `LazyLoadable` Backend offer?

This backend offers a performance optimization for environments where only a fraction of the app's translation data is actually required. Most notably, a local test environment.

As opposed to the Simple backend, this backend avoids loading all translations in the load path. Instead, it infers which files need to be loaded based on the current locale. To do so, it imposes a format on the files in the load path. They must abide by a specific format structure to enable the backend to reason about which files belong to which locale. We trade off the rigidity of the imposed format with the performance incentive achieved by only loading files that are needed.

In other words, this backend avoids the cost of loading unnecessary translation files by carefully selecting only those files which are needed for the current locale. It lazily initializes translations on a per locale basis.

How does the `LazyLoadable` Backend work?

This backend trades off the expensive cost of I/O with the cost of perform string matching on files in the load path. It makes assumptions about which files belong to a locale and selectively loads only these files.

How does the `LazyLoadable` Backend know which files belong to which locale?

It makes assumptions about how files are named. Clients must abide by this naming system if they decide to use this backend.

The heuristic used to bind a file to its locale can be defined as follows: 1) the filename is in the I18n load path 2) the filename ends in a supported extension (ie. .yml, .json, .po, .rb) 3) the filename starts with the locale identifier 4) the locale identifier and optional proceeding text is separated by an underscore, ie. "_".

Working Through An Example

Assume an app's I18n.load_path consisted of the following files:

config/locales/en_001.yml
config/locales/en_002.yml
config/locales/en_003.yml
...
config/locales/en_n.yml
config/locales/fr_001.yml
config/locales/fr_002.yml
config/locales/fr_003.yml
...
config/locales/fr_n.yml
config/locales/de_001.yml
config/locales/de_002.yml
config/locales/de_003.yml
...
config/locales/de_n.yml

A test is run in the local environment which requires a single :en translation. Currently, when the Simple backend is initialized, all files will be loaded into memory.

This results in 3n loads if we assume there are only 3 locales.

With the LazyLoadable backend, we can conventionally select only the :en translations resulting in n loads.

When should someone use this backend?

The backend has two working modes: lazy_load and eager_load.

This backend should only be enabled in test environments!

When the mode is set to false, the backend behaves exactly like the Simple backend, with an additional check that the paths being loaded abide by the format. If paths can't be matched to the format, an error is raised.

It's particularly useful to enable for workloads that operate in the context of a single locale at a time and have many translations files for many locales. For instance, a large Rails workload would benefit from this backend in the local test environment.

Benchmarks: Comparing the `Simple` backend to the `LazyLoadable` backend

A benchmark setup was used to compare the performance of these two backends.

Table 1: Setup with 10 files per locale, 100 keys in each file:

Backend	Work Performed	User	Sys	Total	Real
Simple	Eager load (:en)	0.012764	0.000721	0.013485	0.013503
Simple	3 Eager loads (:en, :fr, :de)	0.012364	0.000675	0.013039	0.013038
LazyLoadable	Eager load (:en)	0.004820	0.000330	0.005150	0.005137
LazyLoadable	3 Eager loads (:en, :fr, :de)	0.019816	0.000847	0.020663	0.020674

Table 2: Setup with 100 files per locale, 1000 keys in each file:

Backend	Work Performed	User	Sys	Total	Real
Simple	Eager load (:en)	1.342190	0.020641	1.362831	1.363569
Simple	3 Eager loads (:en, :fr, :de)	1.344860	0.018035	1.362895	1.363284
LazyLoadable	Eager load (:en)	0.478600	0.011205	0.489805	0.489951
LazyLoadable	3 Eager loads (:en, :fr, :de)	1.357584	0.026064	1.383648	1.384148

Evaluating the results

The LazyLoadable backend reduces working time as it avoids loading unnecessary files. In the case when loading for a single locale, we see that the LazyLoadable backend outperforms Simple, 0.005 vs 0.013 in Table 1 and 0.4899 vs 1.363 in Table 2.

This time reduction is a function of the number of locales, so we see 3x improvements because we avoid loading 66% of the files. This scales with the number of files avoided.

Note: The LazyLoadable backend performs roughly on-par with the Simple backend when it needs to load all translations. There is additional overhead of string matching which brings down the performance in small workloads. It's negligible in any significant workloads compared to the time spent in I/O.

Industry Proof

At Shopify, we've patched ruby-i18n locally to implement a similar strategy. We've observed close to 10x speed ups locally in specific tests and roughly 20% speeds across the suite.

Conclusions

This backend is designed to bring performance improvements to workloads with a large volume of locales, translation files, and translation keys.

It's designed for the local test environment, and is an opt-in backend.

radar commented 2 years ago

I'll release this with a fix for #606 as the 1.10 release, ideally by next Monday.

salochara commented 8 months ago

Hello! @paarthmadan 👋 I hope you're doing great.

I'm working on improving performance for the faker gem. We're evaluating the option of enabling this LazyLoadable backend. It looks like a pretty awesome improvement 🎉, as shown in the Industry proof you kindly shared.

I just have a question regarding this implementation... This backend should only be enabled in test environments! What's the reasoning behind this? 🤔

All the best! Salomón.

casperisfine commented 8 months ago

This backend should only be enabled in test environments! What's the reasoning behind this?

In production you'd rather load all the translation as part of boot so:

The first user to need an translation isn't slowed down by the extra read + parsing (can be very slow on large files)
Assuming your app uses a forking server (unicorn, puma, etc), the data will mostly be in memory pages that are automatically shared by Copy on Write, reducing the memory usage.

paarthmadan commented 8 months ago

Hey @salochara, I'd echo all that @casperisfine shared and add in addition that:

The test environment, in particular, is the perfect candidate for lazy loading translations because:

We expect the test environment to be started and stopped frequently
Certain tests don't require any translations
Tests that do require translations typically require a small subset of the entire pool

These factors together benefit from lazy loading because we drastically reduce startup time, we only ever load translations that we need, and we only incur this penalty for tests that do actually require translations.

Jean provided a great argument for why this shouldn't be used in production, but these are added reasons for why it makes added sense in the test env.

salochara commented 8 months ago

Hi! @paarthmadan @casperisfine 👋 Thank you very much for your responses. I really appreciate it.

Got it. Makes sense. Now it's very clear why this is intended for the test environment.

Again, thank you very much for your response and the work you guys kindly put out for the community.

All the best! 🙏🏼

ruby-i18n / i18n

LazyLoadable Backend #612

What's in this PR

What does the `LazyLoadable` Backend offer?

How does the `LazyLoadable` Backend work?

How does the `LazyLoadable` Backend know which files belong to which locale?

Working Through An Example

When should someone use this backend?

Benchmarks: Comparing the `Simple` backend to the `LazyLoadable` backend

Evaluating the results

Industry Proof

Conclusions

ruby-i18n / i18n

LazyLoadable Backend #612

What's in this PR

What does the LazyLoadable Backend offer?

How does the LazyLoadable Backend work?

How does the LazyLoadable Backend know which files belong to which locale?

Working Through An Example

When should someone use this backend?

Benchmarks: Comparing the Simple backend to the LazyLoadable backend

Evaluating the results

Industry Proof

Conclusions

What does the `LazyLoadable` Backend offer?

How does the `LazyLoadable` Backend work?

How does the `LazyLoadable` Backend know which files belong to which locale?

Benchmarks: Comparing the `Simple` backend to the `LazyLoadable` backend