Simplify NLS IDs - Githubissues

hediet commented 2 years ago

Problem

This is some source code that uses NLS:

const res = await this._dialogService.confirm({
    type: 'info',
    message: nls.localize('restart1', "Profile Extensions"),
    detail: nls.localize('restart2', "In order to profile extensions a restart is required. Do you want to restart '{0}' now?", this.productService.nameLong),
    primaryButton: nls.localize('restart3', "&&Restart"),
    secondaryButton: nls.localize('cancel', "&&Cancel")
});

These NLS calls use hand-crafted IDs. Very often, these IDs are meaningless and we need to add "comments" for the translators anyway. But still, it requires effort and mental energy to come up with these unique IDs.

Especially when other IDs are involved, their relationship is unclear:

'editor.minimap.enabled': {
    type: 'boolean',
    default: defaults.enabled,
    // How does `minimap.enabled` relate to `editor.minimap.enabled`?
    // Shouldn't the NLS id be `editor.minimap.enabled` too?
    description: nls.localize('minimap.enabled', "Controls whether the minimap is shown.")
},
'editor.minimap.autohide': {
    type: 'boolean',
    default: defaults.autohide,
    description: nls.localize('minimap.autohide', "Controls whether the minimap is hidden automatically.")
},
'editor.minimap.size': {
    type: 'string',
    enum: ['proportional', 'fill', 'fit'],
    enumDescriptions: [
        // Here, proportional refers to the enum value, but doesn't occur in the text.
        // This could lead to inconsistencies when the enum value is being renamed.
        nls.localize('minimap.size.proportional', "The minimap has the same size as the editor contents (and might scroll)."),
        nls.localize('minimap.size.fill', "The minimap will stretch or shrink as necessary to fill the height of the editor (no scrolling)."),
        nls.localize('minimap.size.fit', "The minimap will shrink as necessary to never be larger than the editor (no scrolling)."),
    ],
    default: defaults.size,
    description: nls.localize('minimap.size', "Controls the size of the minimap.")
},

Idea

To reduce the effort of coming up with NLS IDs, I suggest to use random computer generated ids.

Extensive tooling support would assist with generating such ids.

Implementation

With random IDs, the code could look like this:

const res = await this._dialogService.confirm({
    type: 'info',
    message: nls.localize('9csKXyeV', "Profile Extensions"),
    detail: nls.localize('Mw2DXaQ2', "In order to profile extensions a restart is required. Do you want to restart '{0}' now?", this.productService.nameLong),
    primaryButton: nls.localize('BDiPzE11', "&&Restart"),
    secondaryButton: nls.localize('Xwjwmvkt', "&&Cancel")
});

Extensions could help to create such ids (for a different NLS call/syntax however, just as proof of concept here):

recording

Especially with multi cursors, wrapping text in NLS calls would be very convenient.

Additional Thoughts

1) Why IDs and not just text? (localize("In order to profile extensions a restart is required."))

IDs allow distinguishing the same English texts, which may be translated differently in other languages.
IDs allow to keep track of text changes, so that translators can keep the old translation. 2) Why random IDs?
If the ID is just a function of the text, argument 1 applies.
If the ID has some unclear meaning, it requires mental energy to come up with such an ID. Random IDs just don't have any meaning by definition.
They can be generated by tools
They don't need to be updated when the text changes.

FYI @TylerLeonhardt

bpasero commented 2 years ago

💯 for random Ids, but how would we adopt all existing translations to the new keys?

Btw I think changing the ID when the text changes is actually crucial because only that guarantees that translations are also updated, so maybe the ID should just be the hash of the text you type?

hediet commented 2 years ago

how would we adopt all existing translations to the new keys?

I don't see the problem here: We can do just an automated find+replace?

Btw I think changing the ID when the text changes is actually crucial because only that guarantees that translations are also updated

But this is already a much bigger problem in this example:

nls.localize('restart1', "Profile Extensions"),

When you want to change Profile Extensions to Measure performance of Extensions, how would you update the ID?

Also, I think this is a tooling issue of the translation system: Ideally, the translation system should prompt translators to re-translate a text when the English text changed significantly.

so maybe the ID should just be the hash of the text you type?

In that case, I would change the NLS system to use `${ID}${englishText}` as translation key. Then the ID is still independent, but all translations are lost when the text is changed.

FYI, according to this, the approximate expected value of how many random items you need to pick from a set of size n, is $\sqrt{n \pi / 2}$. Thus, if the random ID is a string of length 8 and uses lowercase+uppercase+numbers (which yields an alphabet of size 62), according to the formula, on average, you need to draw about 18.519.390 of such IDs until the first collision happens. If we add a random ID every minute, it is a 50:50 chance if a collision happens after 35 years or before.

bpasero commented 2 years ago

I don't see the problem here: We can do just an automated find+replace?

Yeah what I meant is that this needs coordination across all published extensions that provide a translation for VSCode.

As for tooling, show a warning when the hash mismatches the value and don't allow to commit or fail the build.

hediet commented 2 years ago

Yeah what I meant is that this needs coordination across all published extensions that provide a translation for VSCode.

Not neccessarily, switching from meaningful to random IDs can happen incrementally, as you can just start using random IDs for all the new localizable strings immediately. From time to time, old IDs can then be replaced with random ones (both in code and all translations files) - this can be done by a tool and doesn't need to happen all at once.

As for tooling, show a warning when the hash mismatches the value and don't allow to commit or fail the build.

I wouldn't do that: If the ID is a deterministic hash from the text, you can also just use the text directly as ID. I would rather suggest to use hash(id + text) as actual translation key, so that the ID can still be used to differentiate equal texts, but changing the text invalidates all the translations.

To enable an incremental adoption, there could be a lookup for both hash(id + text) and id.

TylerLeonhardt commented 2 years ago

Keep in mind, the translators do get to see the id and I would guess that they use that as some little bit of context when determining the right translation.

I do think for simpler cases localize('In order to profile extensions a restart is required.') an id isn't useful... but something like: localize('Select a language') can be ambiguous... whereas localize('coding language selection link', 'Select a language') indicates that language is a programming language and not a spoken language which matters in some spoken languages.

I like the idea of moving to a model where:

Id's go away and are auto-generated
commenting strings becomes common practice when needed (right now, barely anyone uses comments for context on the words in the string)
backcompat is still and always will be important

microsoft / vscode

Simplify NLS IDs #155982

Problem

Idea

Implementation

Additional Thoughts