lingui / js-lingui

🌍 πŸ“– A readable, automated, and optimized (3 kb) internationalization for JavaScript
https://lingui.dev
MIT License
4.55k stars 380 forks source link

Generate PO files only for components that contain some localized code #2024

Open hejtmii opened 2 months ago

hejtmii commented 2 months ago

Is your feature request related to a problem? Please describe. Our codebase is split into a large number of small files. Many of them are just app logic / Redux reducers / thunks etc. We prefer the options to store localizations for each component separately using the {name} macro. (ideally in the same directory as the component so we can let AI help us with it and have it coupled with a nearby source code to provide extra context - I will file a separate feature request for that...) But it turns out that extract generates "empty" PO files even for code that doesn't need them, which kind of spams our repo.

Describe proposed solution Do not generate PO files for source files without localizations

Describe alternatives you've considered Make it configurable, keeping current behavior as default so that a breaking change is not introduced?

Additional context Please let me know if a PR for this could be accepted or if there are some important internal reason why you decided to generate PO files even for components without localizations.

If the idea is passable, I could create a PR for that.

timofei-iatsenko commented 2 months ago

Your usecase is not something that usually Lingui users do. Usually catalogs created for whole app or for the slice. And path's to this catalogs used somewhere in the runtime code, so creating catalog even if they empty is expected behaviour.

Regarding your changes, i'm not sure that the value of this feature would be worth effort of having and maintaing this in the codebase.

How you are going to load this catalogs after all? Have a loading code in every component?

Maybe it's better/easier for you to use lingui api and write your own extractor for your specific case.

hejtmii commented 2 months ago

How you are going to load this catalogs after all? Have a loading code in every component?

I assume this relates to "ideally in the same directory as the component" part, right? I am discussing that part separately here https://github.com/lingui/js-lingui/issues/2024 assuming that the build process could collect the catalogs recursively from the whole app and the result linked from root of the app in a way similar as outlined here https://lingui.dev/ref/conf#catalogsmergepath

Need to say I don't yet fully understand the life cycle of it.

Anyway, this issue is mainly about the empty files. I am interested in not generating the empty files even in the typical scenarios described in the examples https://lingui.dev/ref/conf#examples

The thing is that our app consists of:

So overall about 80% of out PO files are "empty" which doesn't feel right...

And we don't want to have everything in a single PO file because it would be extremely complicated to translate just parts of it related to specific component with the AI when a part of our codebase changes.

Palid commented 3 weeks ago

+1 for what @hejtmii is talking about, I tried having one big catalog per language (en.po and no.po), and it was unmaintainable - every time anyone changed pretty much anything in any file that had a translation, it ended up with horrible conflicts. Unfortunately if you go with even this suggestion from the docs, you're unable to directly load .po files and are stuck with npx lingui extract --watch (which doesn't understand neither clean nor overwrite options while in watch mode, unfortunate; maybe that's a separate bug?) and npm lingui compile --watch with catalogsMergePath defined in your lingui.config.* so you can properly import your translations while developing things.

@timofei-iatsenko How is this problem solved in any kind of repositories bigger than 1-man-army? Extracting things to one big file just doesn't work if you're working with literally anyone other than yourself. It is enough of a problem for a single developer if you frequently have to work on some different branches that modify the same file!

I strongly agree with @hejtmii, as even in a POC repository where I have only one translated file I already generated 14+ .po files with only headers, which aren't easily importable with the loaders!

I'll be happy to work with @hejtmii on this one just so we could solve the issue; a separate extractor, or some options to it, could be the solution, but I'm not super keen on it yet. On the other hand, the default extractor has lots of issues with globs and it is not entering and reproducing paths with nested directories correctly, so maybe the proper way to solve it is actually a custom extractor and documenting it? πŸ€”

timofei-iatsenko commented 3 weeks ago

@Palid Let's look at each point separately

I tried having one big catalog per language (en.po and no.po), and it was unmaintainable - every time anyone changed pretty much anything in any file that had a translation, it ended up with horrible conflicts.

How is this problem solved in any kind of repositories bigger than 1-man-army? Extracting things to one big file just doesn't work if you're working with literally anyone other than yourself.

In my opinion, that was a huge bad decision by original Lingui authors to implement 2 actions in one command. That exactly what lingui extract is doing, when extracting and merging translations in one shot. That, actually, causes this horrible merge conflicts. In all other enterprise-grade i18n systems, I worked before it implement differently. You have two steps - one for extraction to a "master" file, and one to update your translations catalogs from this master file with some 3rd party tool. The master file is usually added to git ignore or if you decided to not, conflicts in this file fixed pretty easily with simple re-extracting.

Lingui supports this flow with lingui extract-template command. And that what we are using on pretty big project without any merge conflicts.

Another option could be disabling line numbers or source references completely using Po formatter settings.

On the other hand, the default extractor has lots of issues with globs and it is not entering and reproducing paths with nested directories correctly

Could you share reproductions so we can work on them. I hear about that for the first time. We are also opened to contributors and happily accept PRs.

Palid commented 3 weeks ago

@timofei-iatsenko Gladly can work on that, considering I have two projects that could use this. I'll provide a reproduction repository for all those separate problems!

I feel like there's a couple of separate issues, related to the tool's legacy (even though it's not that old). Seems that docs suggest a default solution that will result in tons of conflicts, but if you try to go other way, it creates some other issues with the tooling that makes some of the features of LingUI no longer available, like loaders. I'll try to list problems below:

Considering all of above, maybe the easiest way to at least remove some of the troubles would be to define development and production steps, and figure out where can we simplify&improve the tooling? Having so many different abilities to extract translations, while still not being able to configure it the way you want (e.g. generating .po files for all the languages near your source files if you have a nested structure, while still generating empty .po files) makes this quite a problem.

Before getting to the reproduction repository, I can share the tree and lingui config where this nesting is already a problem: Source tree:

components
β”œβ”€β”€ ThemeToggle.tsx
β”œβ”€β”€ router-entry.tsx
└── ui
    β”œβ”€β”€ alert-dialog.tsx
    β”œβ”€β”€ avatar.tsx
    β”œβ”€β”€ badge.tsx
    β”œβ”€β”€ button.tsx
    β”œβ”€β”€ card.tsx
    β”œβ”€β”€ input.tsx
    β”œβ”€β”€ progress.tsx
    β”œβ”€β”€ text.tsx
    └── tooltip.tsx

lingui.config.js:

/** @type {import('@lingui/conf').LinguiConfig} */
module.exports = {
  locales: ['en', 'no'],
  sourceLocale: 'en',
  catalogs: [
    {
      path: 'locale/{locale}/{name}',
      include: ['components/**/{name}'],
    },
  ],
  catalogsMergePath: '.locales/{locale}',
  format: 'po',
};

Running lingui extract now would generate this tree:

locale
β”œβ”€β”€ en
β”‚Β Β  β”œβ”€β”€ ThemeToggle.tsx.po
β”‚Β Β  β”œβ”€β”€ alert-dialog.tsx.po
β”‚Β Β  β”œβ”€β”€ avatar.tsx.po
β”‚Β Β  β”œβ”€β”€ badge.tsx.po
β”‚Β Β  β”œβ”€β”€ button.tsx.po
β”‚Β Β  β”œβ”€β”€ card.tsx.po
β”‚Β Β  β”œβ”€β”€ input.tsx.po
β”‚Β Β  β”œβ”€β”€ progress.tsx.po
β”‚Β Β  β”œβ”€β”€ router-entry.tsx.po
β”‚Β Β  β”œβ”€β”€ text.tsx.po
β”‚Β Β  β”œβ”€β”€ tooltip.tsx.po
β”‚Β Β  └── ui.po
└── no
    β”œβ”€β”€ ThemeToggle.tsx.po
    β”œβ”€β”€ alert-dialog.tsx.po
    β”œβ”€β”€ avatar.tsx.po
    β”œβ”€β”€ badge.tsx.po
    β”œβ”€β”€ button.tsx.po
    β”œβ”€β”€ card.tsx.po
    β”œβ”€β”€ input.tsx.po
    β”œβ”€β”€ progress.tsx.po
    β”œβ”€β”€ router-entry.tsx.po
    β”œβ”€β”€ text.tsx.po
    β”œβ”€β”€ tooltip.tsx.po
    └── ui.po

3 directories, 24 files

Most of the files in here are entirely empty, other than the headers, which could make it entirely skippable for extraction, though it still generates them (I guess that's the problem @hejtmii mentioned). The entire contents of cat locale/en/alert-dialog.tsx.po:

msgid ""
msgstr ""
"POT-Creation-Date: 2024-10-09 13:10+0200\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=utf-8\n"
"Content-Transfer-Encoding: 8bit\n"
"X-Generator: @lingui/cli\n"
"Language: en\n"

This amount of files has been generated using React Native Reusables initial template and it already created so many empty files, that aren't even properly nested (which is currently impossible with the extraction mechanism). The even worse thing is that you can't really use @loader anyways, because you have a ton of .po files, so the simple and nice dev experience does not apply anymore.

timofei-iatsenko commented 3 weeks ago

@Palid

  • The @lingui/loader packages expect .po files, which makes development with translations a bit of a pain with extract-template, as you now need to either fully ignore messages object in development, or have a separate step extracting those templates with --watch. Or, for development only, have an entirely different pass with lingui extract that's in .gitignore.

Have you had the chance to try this setup, or are you just speculating about what might happen? The @lingui/loader is designed to work with templates right away. It automatically merges translations, with a fallback to the messages from the template if message is not presented in the translation catalog. You don't need anything from what you described.

  • Trying to go with catalogs per component as suggested in docs is also a bad idea, as you now not only need to lingui extract --watch, but also lingui compile --watch in development mode, as there's no easy way anymore to load all the different .po files.

I, honestly, never have a need for that, so never used.

  • Docs suggesting that the correct go-to route would be huge .po catalogs per language, which will end up with conflicts, unless you do .gitignore them, which explicitely enforces you to use external services for translations, as you no longer have an easy way to sync your translations directly in repository.

Yes, suggested approach would be huge catalog per language per entry point. You don't need to add catalogs to the gitignore. You need to add to git ignore only template. Furthermore, you also don't need to extract and commit on every commit. Commit your catalogs only when the translation changed, not on every file change.

  • lingui extract-template does not have any --watch options, which makes it unusable in development at all.

You don't need.

  • lingui.config.js only has a {name} template placeholder, which makes generating .po files in nested directories impossible with any kind of globs, as it doesn't generate the entire path correctly.

You also don't need it.

Prerequisite:

  1. Add lingui extract-template before build of your application, like that
       "build": "lingui extract-template && vite build",
  2. Add template.pot to the git ignore.
  3. if you don't have a translation files for specific language, create an empty po file. Including for source language (en.po for example)
  4. Use a standard loading snippet
    export async function loadCatalog(locale: string) {
      const { messages } = await import(`../locales/${locale}.po`)
      i18n.loadAndActivate({ locale, messages })
    }
  5. Remove any extraction or compiling on pre-commit hooks, if you have them

The Flow

Hope that helps.

Do you translate in feature branches or only when feature is merged to a main branch?

Palid commented 3 weeks ago

@Palid

  • The @lingui/loader packages expect .po files, which makes development with translations a bit of a pain with extract-template, as you now need to either fully ignore messages object in development, or have a separate step extracting those templates with --watch. Or, for development only, have an entirely different pass with lingui extract that's in .gitignore.

Have you had the chance to try this setup, or are you just speculating about what might happen? The @lingui/loader is designed to work with templates right away. It automatically merges translations, with a fallback to the messages from the template if message is not presented in the translation catalog. You don't need anything from what you described.

I actually did try to import .pot directly, as my importing code in one of the projects is doing couple of things, mostly due to it being next.js and that I needed to support server side rendering + having nice DX (as well as supporting Turbopack, see: https://github.com/lingui/js-lingui/issues/1854 - I see we had a short discussion there.). It unfortunately does not work with my use case at all, as having an empty .po file to make sure that the importer correctly resolved the path, and then it automatically falling back to .pot (!!!) is a very weird design choice.

Attaching the code example below, slightly modified for clarity reasons. This

import "server-only";

import { I18n, MessageDescriptor, setupI18n } from "@lingui/core";
import { msg } from "@lingui/macro";
import { setI18n } from "@lingui/react/server";
import linguiConfig from "../../../lingui.config";

export type Lang = (typeof linguiConfig.locales)[number];

export const languages: Record<Lang, MessageDescriptor> = {
  en: msg`English`,
  no: msg`Norwegian`,
};

const translations = require("src/i18n/prod-messages");
// Code for `translations` below:
/**
 *
 * if (process.env.NODE_ENV === "production" || process.env.TEST_ENV === "test") {
 *   const en = require("../locales/en/messages.js");
 *   const no = require("../locales/no/messages.js");
 *   module.exports = {
 *     en,
 *     no,
 *   };
 * }
 */

type MessagesFile = Record<string, string>;

export async function loadLinguiMessages(lang: string): Promise<MessagesFile> {
  if (
    process.env.NODE_ENV === "development" &&
    process.env.TEST_ENV !== "test"
  ) {
    const msgFile = await import(`src/locales/${lang}/messages.po`);
    return {
      [lang]: msgFile.messages,
    };
  } else {
    return {
      [lang]: translations[lang].messages,
    };
  }
}

const { locales } = linguiConfig;
// optionally use a stricter union type
type SupportedLocales = string;

type AllI18nInstances = { [K in SupportedLocales]: I18n };

let catalogs: MessagesFile[] = [];
let allMessages: MessagesFile;
let hasInitializedCatalogs = false;
let allI18nInstances: AllI18nInstances = {};
async function getAllInstances(): Promise<AllI18nInstances> {
  if (!hasInitializedCatalogs) {
    const messages = await Promise.all(locales.map(loadLinguiMessages));
    catalogs = messages;
    allMessages = catalogs.reduce((acc, oneCatalog) => {
      return { ...acc, ...oneCatalog };
    }, {});
    allI18nInstances = locales.reduce((acc, locale) => {
      const messages = allMessages[locale] ?? {};
      const i18n = setupI18n({
        locale,
        messages: { [locale]: messages } as any,
      });
      return { ...acc, [locale]: i18n };
    }, {});
    hasInitializedCatalogs = true;
  }

  return Promise.resolve(allI18nInstances);
}

export async function getI18nInstance(locale: Lang) {
  const allI18nInstances = await getAllInstances();
  return allI18nInstances[locale];
}

export async function getI18nInstanceWithLocale(locale: Lang) {
  const instance = await getI18nInstance(locale);
  setI18n(instance);
  return instance;
}

And for the loading I'm just using a webpack loader config:

/* ... */
  webpack: (config) => {
    config.module.rules.push({
      test: /\.po$/i,
      loader: "@lingui/loader",
    });
    return config;
  },
 /* ... */
  • Trying to go with catalogs per component as suggested in docs is also a bad idea, as you now not only need to lingui extract --watch, but also lingui compile --watch in development mode, as there's no easy way anymore to load all the different .po files.

I, honestly, never have a need for that, so never used.

  • Docs suggesting that the correct go-to route would be huge .po catalogs per language, which will end up with conflicts, unless you do .gitignore them, which explicitely enforces you to use external services for translations, as you no longer have an easy way to sync your translations directly in repository.

Yes, suggested approach would be huge catalog per language per entry point. You don't need to add catalogs to the gitignore. You need to add to git ignore only template. Furthermore, you also don't need to extract and commit on every commit. Commit your catalogs only when the translation changed, not on every file change.

  • lingui extract-template does not have any --watch options, which makes it unusable in development at all.

You don't need.

  • lingui.config.js only has a {name} template placeholder, which makes generating .po files in nested directories impossible with any kind of globs, as it doesn't generate the entire path correctly.

You also don't need it.

Prerequisite:

  1. Add lingui extract-template before build of your application, like that
     "build": "lingui extract-template && vite build",
  1. Add template.pot to the git ignore.
  2. if you don't have a translation files for specific language, create an empty po file. Including for source language (en.po for example)
  3. Use a standard loading snippet
    export async function loadCatalog(locale: string) {
     const { messages } = await import(`../locales/${locale}.po`)
     i18n.loadAndActivate({ locale, messages })
    }
  4. Remove any extraction or compiling on pre-commit hooks, if you have them

The Flow

  • If you are developing locally with your source language (en), empty catalog would be loaded, all messages from source code would be used.
  • If you are developing locally with translation language (pl for example), catalog with partial translations would be loaded, not translated messages would be used from source code
  • If you are bundling for production, you need to have messages.pot file up to date, that's why it added before the build command. Messages from the sourcecode would be not available, lingui loader will compile your catalogs and fallback to template.

Hope that helps.

Do you translate in feature branches or only when feature is merged to a main branch? In that particular project I did translate in feature branches.

I think we're talking about a few different problems here. Your suggestion does not solve the problem with conflicts, as we're still stuck with a one huge .po file, though now it's required to be manually filled. It certainly does help a bit with generating a production build, but it wasn't an issue in my case anyways. Doing the extract-template thing definitely makes things easier, but having an additional required step of adding translations later doesn't seem ideal. Another problem is co-location of the files, which you mentioned you never needed to use. There are cases where you might have a component named exactly the same way under a different, nested path (it unfortunately does happen), and currently there's no way to have two components named Button, one under ui/button.tsx and one under ui/user/button.tsx, as those will be merged into one catalog, e.g.

#: components/button.tsx:7
msgid "First message"
msgstr "First message"

#: components/user/button.tsx:7
msgid "Second message"
msgstr "Second message"

The final issue is loading those multiple catalogs in development - lingui doesn't really provide any way to do it well, as you'd have to manually define the imports in every single component, which kind of ruins the idea of good developer experience and ease of use. It'd be perfect if the loader could understand lingui config and deliver the translations based on default or defined priority (e.g .js first, then .json, .po, .pot, etc.), as even though docs allow for having this tree-based configuration, it pretty much requires having a single .po or .json file as the entrypoint for development.

Your suggestion will be good enough for this particular problem as long as I use a dedicated service for the translations and never change the .po files manually, but then docs still show a way to configure your project that's basically a footgun, as it'll just make the development a lot harder, with barely any additional benefits.

To sum it up, which one would you prefer?

I'd very much prefer the first choice, even though it's going to make maintaining it definitely harder. I'm willing to take over the development for this myself, as this feature would be really beneficial for my $DAYJOB stuff. Whichever you choose, let me know, I'll gladly help with doing the heavy lifting here. Having a trap like this in documentation definitely isn't perfect, and having to fork/patch the library to make it do the thing documentation suggests it can is far from great experience.

Palid commented 2 weeks ago

Pinging @timofei-iatsenko as you might have not noticed the wall of text above, I'd love to help on that in addition to the turbopack PR. πŸ˜„

We can have a chat on something like matrix if you'd prefer, palid@hackerspace.pl if you are able to chat there.