Add use case for `source` expression attribute

eemeli commented 2 months ago

While working on moz.l10n, a new Python localization library that uses the MF2 message and resource data model to represent messages from a number of different current syntaxes, I've come across at least the following use cases for expression attributes:

In addition to supporting a limited set of HTML elements, Android String Resources may use <xliff:g> to wrap nontranslatable content. This is best represented in MF2 with a @translate=no attribute.
Web extension messages.json files allow for named placeholders that are mapped to indexed arguments. These may include an example, which is best represented in MF2 as an @example=... attribute.
Apple's Xcode supports localization of plural messages via .stringsdict XML files, which encode the plural variable's name as a NSStringLocalizedFormatKey value, where it appears as e.g. %#@countOfFoo@ or similar. To display only the relevant "countOfFoo" name of this variable to localizers as context, it's best to use a @source=... attribute on the selector.

The first two use cases are already documented, but the last one is not; it's added by this PR.

The overall use case of the underlying work is to make use of the MF2 data model to provide a unified representation of messages in many different syntaxes, so that e.g. validation and a UI for plural message editing can be applied to all formats, rather than needing separate parsing and handling for each.

aphillips commented 2 months ago

Android String Resources may use to wrap nontranslatable content. This is best represented in MF2 with a @translate=no attribute.

An argument can be made that this is a job for markup, since, after all, an XLIFF processor might want to directly consume the g.

Apple's Xcode supports localization of plural messages

Side thought: we probably want to chat with the Apple, Android, and MSFT folks about adopting MF2 into some of their resource/API syntaxes.

it's best to use a @source=... attribute on the selector

I'm not sure I understand the @source annotation you're proposing. Why wouldn't the caller just assign the value to a named argument to MF2 in the setup to calling the formatter? Why does the translator need to know the original name?

Apple's doc furnishes this example of the format you're talking about:

<plist version="1.0">
    <dict>
        <key>%d home(s) found</key>
        <dict>
            <key>NSStringLocalizedFormatKey</key>
            <string>%#@homes@</string>
            <key>homes</key>
            <dict>
                <key>NSStringFormatSpecTypeKey</key>
                <string>NSStringPluralRuleType</string>
                <key>NSStringFormatValueTypeKey</key>
                <string>d</string>
                <key>zero</key>
                <string>No homes found</string>
                <key>one</key>
                <string>%d home found</string>
                <key>other</key>
                <string>%d homes found</string>
            </dict>
        </dict>
    </dict>
</plist>

Isn't this represented in MF2 as:

.input {$homes :integer}
.match {$homes}
0 {{No homes found)}
one {{{$homes} home found}}
* {{{$homes} homes found}}

The %#@homes@ is needed to bind homes to the sprintf-style positional arguments (%d in the example). Presumably MF2 already does this by name.

eemeli commented 2 months ago

Android String Resources may use xliff:g to wrap nontranslatable content. This is best represented in MF2 with a @translate=no attribute.

An argument can be made that this is a job for markup, since, after all, an XLIFF processor might want to directly consume the g.

I'm building a workflow where the source content can be parsed into an MF2 data model, modified, and then reserialised in the original format. So there isn't necessarily any XLIFF processor involved here, and even if there were, the use of <xliff:g> is completely custom in the Android format, and does not match with the "generic group placeholder" meaning that the XLIFF spec places on it. Hence representing the intent of the original syntax with an attribute, rather than modelling the input exactly.

it's best to use a @source=... attribute on the selector

I'm not sure I understand the @source annotation you're proposing. Why wouldn't the caller just assign the value to a named argument to MF2 in the setup to calling the formatter? Why does the translator need to know the original name?

In this case, there is no formatter involved in the workflow, so the source needs to be retained to allow for a later serialisation in the format that the iOS or MacOS formatter will be able to process. For the translator, the name of the variable can be an informative part of the message's context, and it's much clearer when lifted out of its syntax trappings.

Isn't this represented in MF2 as:
.input {$homes :integer}
.match {$homes}
0 {{No homes found)}
one {{{$homes} home found}}
* {{{$homes} homes found}}
The %#@homes@ is needed to bind homes to the sprintf-style positional arguments (%d in the example). Presumably MF2 already does this by name.

Yes, and in the MF2 representation the %#@homes@ string is needed to reliably transform the MF2 back into the corresponding stringsdict value. Sometimes it also carries a positional indicator, and other content; it's not always a %#@ prefix and @ suffix to the variable name.

aphillips commented 2 months ago

For the translator, the name of the variable can be an informative part of the message's context, and it's much clearer when lifted out of its syntax trappings.

Agreed, but one could extract the name (and/or decorate) the name to generate the expression operand. I understand that the NSStringLocalizedFormatKey is actually a construct for enumerating what we'd call operands and aligning them with classical "placeholders". You have to parse that string in your implementation, IIUC (not having worked with it, only having glanced at the documentation).

Yes, and in the MF2 representation the %#@homes@ string is needed to reliably transform the MF2 back into the corresponding stringsdict value. Sometimes it also carries a positional indicator, and other content; it's not always a %#@ prefix and @ suffix to the variable name.

👍

So there isn't necessarily any XLIFF processor involved here, and even if there were, the use of is completely custom in the Android format, and does not match with the "generic group placeholder" meaning that the XLIFF spec places on it. Hence representing the intent of the original syntax with an attribute, rather than modelling the input exactly.

Understood, but there is Android's processor and this does still look like markup in that context. FWIW, XLIFF elements are implemented in many different ways by different tools. So there are many dialects already.

Overall, what you're doing can obviously work. I'm just curious whether we already provide the necessary constructs.

Thought: does this suggest the need for namespaced or custom attributes? @source is fine, but maybe @moz:source would avoid conflicts with other interpretations in tooling downstream?

eemeli commented 2 months ago

Agreed, but one could extract the name (and/or decorate) the name to generate the expression operand. I understand that the NSStringLocalizedFormatKey is actually a construct for enumerating what we'd call operands and aligning them with classical "placeholders". You have to parse that string in your implementation, IIUC (not having worked with it, only having glanced at the documentation).

Eh, or I can just extract the relevant-to-translators bit out of it (the variable name), and leave the rest as line noise that I hide away. The "IIUC" bit that you mention is hard here, because this syntax isn't well documented, and I'm not myself 100% confident I've understood all of it.

Understood, but there is Android's processor and this does still look like markup in that context.

Yes, and in some cases like

<xliff:g><b>foo</b></xliff:g>

I do need to leave it in as markup like

{#xliff:g @translate=no}{#b}foo{/b}{/xliff:g @translate=no}

but that's less useful and less friendly to a translator or tooling than e.g. representing

<xliff:g id="user" example="Bob">%1$s</xliff:g>

as

{$user :xliff:g example=Bob @translate=no @source=|%1$s|}

Thought: does this suggest the need for namespaced or custom attributes? @source is fine, but maybe @moz:source would avoid conflicts with other interpretations in tooling downstream?

That's actually a big part of why I opened this PR. If we find agreement on what a @source attribute is supposed to mean, then I don't need to use a namespaced one.

mihnita commented 2 months ago

Note that the way <g> the way is used in the Android files is bad.

It is meant to declare the text between <g>...</g> as non-localizable. But in XLIFF the content between the tags is very much localizable. The <g> is intended to use for things like <b>, <i>, and so on.

I though that "do not translate" is already representable in MF2 as "...{|don't translate this|}..."

unicode-org / message-format-wg

Add use case for `source` expression attribute #772