Collecting Requirements for Per-Language Splitting

LorisSigrist commented 2 months ago

Context

Paraglide currently splits messages by component / page. If you load a page with 3 client components (or your framework's equivalent) only the messages for those three components are sent to the client. But, they are currently sent in all languages. Ideally we would only send messages in the language that is displayed.

This issue collects ideas on how that could be achieved

Expected Impact - Case Study Inlang.com

The average translation (1 message in one language) on Inlang.com is about 50 - 60 bytes. Times that by the number of languages (7) & you get the average impact per message. About 400 bytes.

There are about 200 messages on the Website, but because of per-page splitting only an average of 20 are loaded when you go to a page. This leaves us with a bundle-size impact of 400 * 20 = 8kB per page on average.

If we got per-language splitting to work on top of that it could save 6 out of 7 bytes, leaving us at just over 1kB. This would be a huge win, but only if the language-splitting adds less than 7kB to the client bundle.

Inlang.com has 7 languages, which is more than most sites. Usually you would have between 2 and 4. So the actual size-limit for the per-page splitting runtime would be about 2kB. For context: i18next is 40kB.

Work done so far

We have already tried a few approaches & run into various challenges.

Copying the routes/ directory for each language & using middleware to multiplex between the different builds based on language.
- Imports from in/out of the routes/ folder are incredibly fragile
- Doesn't work for all routers
- Only works if the framework has a rewrite mechanism
Post-processing the build output by copying each output file for each language and replacing messages with the language-specific version.
- Doesn't work with compressed build ouputs
- Introduces various linking issues

Fundamentally this is a dynamic linking problem in a world of ESM and static linking, which is really hard.

Another promising idea that we haven't tried yet is to serialize the messages & pass them along with the page-data. However, there are open questions on how we would know which messages need to be sent .

Note: Lazy Loading is not the Solution

Any solution using fetch or await import is bound to introduce a render-fetch waterfall which drastically increases Time-To-Interactive. Eagerly loading messages in all languages is preferable in the vast majority of cases.

Most projects have between 2-4 languages, lazy-loading only becomes justifiable at 10<.

osdiab commented 2 months ago

Keenly watching this. Seems like a core make or break feature that determines if this library can truly scale.

LorisSigrist commented 2 months ago

Per-Language splitting is one of our big goals!

That being said, Paraglide already does scale really well. Because of it's small footprint (tiny runtime, minified message ids, per-client-component-splitting) it already stays small, even when shipping extra languages.

We did some benchmarks on this:

As long as you stay under 5 Languages Paraglide already is the smallest choice.
If you're using a Framework with Server-Components / Islands / Some sort of partial hydration it stays the best choice for up to 10 languages.

Per-Language splitting will make it so that paraglide stays the best regardless of how many languages you have, but for a lot of projects it's already the best choice.

osdiab commented 1 month ago

Another promising idea that we haven't tried yet is to serialize the messages & pass them along with the page-data. However, there are open questions on how we would know which messages need to be sent

Maybe leveraging AsyncLocalStorage (NextJS already seems to use this for headers()) to have a request context for this could help, having the translation functions add to a list at runtime?

LorisSigrist commented 1 month ago

That's an interesting idea, however, that likely only catches the messages that are actually executed during server-rendering, not messages that are used conditionally. We would need those too.

osdiab commented 1 month ago

Hmm yeah, in that case it probably can’t be a runtime thing then. Maybe can crawl the AST at compile time to find every invocation of a translation function, traversing from the starting point for each route (I think should be clear for each metaframework, eg for NextJS any default export from a page/layout/route file, not sure how one would achieve this framework agnostically though).

opral / inlang-paraglide-js