What's left to discuss on markup?

unicode-org / message-format-wg

Developing a standard for localizable message strings

Other

229 stars 33 forks source link

What's left to discuss on markup? #375

Closed eemeli closed 9 months ago

eemeli commented 1 year ago

As we merged #371, we didn't explicitly discuss if this allows us to close all or some of the following issues:

Rather than commenting on each of those, I thought it might be appropriate to reflect as a whole what parts of markup we ought to still discuss at this time, and then close or highlight those issues.

My own sense is that all of the above could and should be closed.

cdaringe commented 1 year ago

At Walmart, our c, java, and javascript ICU MFv1 implementations use extensive markup, without the need for ICU MF direct markup support. I was surprised to see markup supported here in MFv2. As a heavy user of markup, I am biased to MF offering a pattern of IOC--giving me full control of-all-things-formatting-and-templating--as opposed to a limited markup DSL within MF. That's a bit abstract, so allow me to share a micro-demo below.

To support not only markup, but all other use cases where a user may want to tap or map over formatted strings, we extended MFv1 to add an optional fmt/callback at each visit to variable:

Demo:

termsAndConditions: Read and agree to our {termsText, select, other {Terms of Use}} and ...SNIP...

m(messages, "termsAndConditions", {
  termsText: {
    fmt: (x) => (
      <Link className="dark-gray" href={externalTermsHref}>
        {x}
      </Link>
    ),
  }
})

Observations:

on visit-exit of termsText, our MFv1 compiler says "hey, i see fmt present. i'll pass the result of the select into this format function before continuing to fold my full formatting
in my fmt function, i unlock the following features
- unconstrained markup: I decorate select output with rich html/javascript
- unconstrained data: I add externally scoped data directly into my markup. My MF string needs no knowledge of externalTermsHref, which is powerful.
- unconstrained return type: I return a React component, not a string!

With respect to the markup DSL,

the markup DSL forces me to encode data that I don't want in my MF strings (e.g. button hrefs, or other non translation relevant content).
the markup DSL encourages churn--visual changes likely invoke translation workflows. changing a button href, for example ought not invoke translations submissions, which I'd wager they would practically for development teams
the markup DSL locks me into strings

I do not mean to disparage the markup DSL--I see it's pragmatism too, especially for highly static content!

However, as a MF+markup power-user, I don't want the markup DSL--I want a more powerful format capability that puts markup generation in my control and out of MF entirely.

I'd need a better way to enter my markup formatting logic.
1. In my demo, you can see i abuse the select to enter my formatting. Just invoking my format on vars in MFv2 could possibly do the trick:
  1. termsAndConditions: Read and agree to our $termsText and ...SNIP...
If MFv2 could offer extension points natively, like my fmt callback, really interesting and portable designs could be achieved.
- For instance, you saw me above map MF => React. Other mappings could occur. For example, I could take my MF outputs and map it into some AndroidViewPrimitive, simply by means of letting me participate in the fold/map cycle of MF formatting. The markup design limits MF to just string output, when the output could possibly support other runtime with an IOC-formatting pattern.

Food for thought!

zbraniecki commented 1 year ago

the markup DSL forces me to encode data that I don't want in my MF strings (e.g. button hrefs, or other non translation relevant content).

I doesn't. MF2 markup elements do not have to store attributes that are not localizable, just like Fluent markup doesn't - the merge happens in bindings between l10n markup and DOM markup.

the markup DSL encourages churn--visual changes likely invoke translation workflows. changing a button href, for example ought not invoke translations submissions, which I'd wager they would practically for development teams

It doesn't, see above.

the markup DSL locks me into strings

Can you extend that point, I do not understand this concern.

zbraniecki commented 1 year ago

Demo:

Your architecture forces new DOM generation on each translation, since it calls fmt which generates new <Link/>. This is a papercut that we should try to avoid. Updating translation should preserve identity of elements where possible.

See list of use cases we collected for markup scenarios at Mozilla - https://github.com/zbraniecki/fluent-domoverlays-js/wiki/New-Features-%28rev-3%29 - I believe your architecture won't scale to those.

cdaringe commented 1 year ago

Your architecture forces new DOM generation on each translation

This is actually not the case. I am using react in my demo, and react uses VDOM. There are other mitigations to this in React to avoid creating a new ad-hoc react component, but i will omit as the demo above was for MVP only 😄 .

Updating translation should preserve identity of elements where possible.

I agree. I'd posit that using the markup API or the IOC pattern i've discussed above are subjected to the same amount of re-ordering/parenting. I don't see any characteristics that really support one soln or the other w.r.t. to this topic, albeit in my demo, there is bulk de-duplication of markup tags/attributes as it's consolidated to a single callsite. In MFv2, because of the top-level flat nature of nested translations, the markup would likely need be duplicated for n of N match branches (thus arguably many opportunities for hierarchy shifts), but that is really of negligible risk.

I believe your architecture won't scale to those.

Maybe! I looked at your link, and i didn't see anything resonate, even weakly so. You probably see something I don't. #11` is avoid churn, which i think the markup DSL offers up a handy footgun to churn.

I need to study the spec a little deeply, candidly, because it could be the case that my proposal is solvable thru :function formatters. However, i'd want ad-hoc formatter functions, not top level formatters, as the formatting in my demo is localized only to the message being formatted.

cdaringe commented 1 year ago

the markup DSL forces me to encode data...

I doesn't. MF2 markup elements do not have to store attributes that are not localizable

Sorry, maybe I was unclear. I could have said to be more complete:

If my application demands rich attributes in my HTML, MFv2 markup DSL invites me to put this content in translation files. MFv2 does not have another means to attach these attributes to my markup around translated entities, thus they must be placed here. Consequently, I must now expand my markup generation from one system to two systems--my usual HTML generation system (e.g. react, jekyll, gatsy, hugo, vanilla-html|js) AND my MFv2 translation system."

zbraniecki commented 1 year ago

This is actually not the case. I am using react in my demo, and react uses VDOM. There are other mitigations to this in React to avoid creating a new ad-hoc react component, but i will omit as the demo above was for MVP only 😄 .

Assuming some secondary strategy will mitigate the architectural choice to avoid new element generation on each pass is suboptimal. MF2 is not React specific, nor any other high level UI toolkit. I'd love to avoid architectural choices that require such footguns/workarounds later.

I encourage you to evaluate scenarios where markup element is:

passed to MF2 formatting from developer
generated out of a function (say MF2 function "STRONG" that wraps some parts in opening/closing pair of markup elements)
introduced by the localizer (things like "em" or "sup" in HTML)

and how your model handles them. I think once you desugar, the proposal is very aligned with what you want to achieve, but more flexible than what you showed in the demo.

cdaringe commented 1 year ago

the markup DSL locks me into strings

Can you extend that point, I do not understand this concern.

Sure, gladly. Thanks for the callout.

MF's goal is to offer templatized string => string translations. Markup formats can be supported because they happen to be string based. That's all fine and good!

Styling/formatting/semantic wrapping of translated content is highly desirable. We all agree that such capability is needed sometimes within a translation. That's certainly settled, as evidenced by the markup DSL to begin with. e.g. User, please <button href='...'>click here!</button>.

The core problem is that stringy-markup is not a portable between runtimes/environments. Not all users of MFv2 necessarily support stringy-markup-based rendering capabilities. Android and iOS for instance are two extremely common environments where translations get in front of users' eyes & ears, but do not use a stringly-based-markup DSL (sans webview) as their primary rendering primitive. Even in my example in web--I use react as the mechanism for providing renderable content, not strings. MFv2 may support markup, but I needed to put my content in a React component, not a string. Thus, I posit that there is both a more portable mechanism to use the core value of MF (translatable template string generation) whilst supporting any given runtime.

MF operates as follows:

Current state:

Given an input I, produce a translated output string.

<I>(input: I) => string

As discussed, this practically works only in limited environments.

Desired state:

Given an input I, and given a formatter to adapt translated string to my environment/runtime, produce a translated output O, where O

default: <I>(input: I) => string
<I, O>(input: I) => (formattedTranslation: string) => O

MF can continue to do all of the great stuff it does. However, rather than making the assumption that the target runtime wants strings only, allow user-space and/or adapters to tailor the output to work in their environment. This would allow not only formatting/styling to happen in a technology agnostic way, but also greatly increase the capability of MF to run portably across different systems. TLDR, let users provide formatting, map translated outputs. It arguably could void the need for a markup DSL, which I weakly suggest may not be required, because given a generic formatting API, voids the need to even embed styling/formatting/markup concerns in my translation content files at all.

cdaringe commented 1 year ago

MF2 is not React specific, nor any other high level UI toolkit.

Strong agreed.

The problem though, is UI toolkits that apply formatting don't all use stringy markup for formatting. Text is for UIs, but only few UIs support string markup. Raw HTML does... but is HTML generation really the only possible UI target MF should be compatible with? I think with some small change, MF could be much more widely applicable.

I suggest that MF could offer the option not concern itself with markup at all. Instead, let the MF callee own taking the translated strings, and converting them into the UI primitive of choice.

MF should organize formatted raw string content (it already does). It is an over-step, or an undersight, IMHO, for MF to assume that it should do a formattedParts.join("") at the end of the formatting process. By default, I think MF really should be doing something like:

this.compiler.envAdapter.format(opts, ...orderedTranslatedStringParts)

MF could get out of the game of producing final strings, and instead focus on the production and ordering of translatable entities, and letting the runtime figure out how to present them. Psuedo code examples:

// js-strings
const format: (opts, parts: TranslatedStringPart[]) => parts.map(p => opts.fmt[p.key] ? opts.fmt(p.key, p.valu,) : p.value;

// react strings
const format: (opts, parts: TranslatedStringPart[]) => <>parts.map(p => opts.fmt[p.key] ? opts.fmt(p.key, p.value) : p.value</>

FormatFn format: (opts, parts: ArrayList<TranslatedStringPart>) => parts.stream().map(p -> {
  if (opts.fmt[p.key])(
    return opts.fmt(p.key, p.value);
  }
return p.value;
}).collect().join("")

Sorry if I'm too verbose 🤓 . Just trying to articulate clearly. We're also clearly both at our keyboards at the same time, so our responses are a bit out-of-order 😄 .

zbraniecki commented 1 year ago

The problem though, is UI toolkits that apply formatting don't all use stringy markup for formatting.

MF2 markup is not stringy markup. It is a DSL so in MF2 markup you annotate markup as "string" (well, function call), but it's not stringy. It is meant to be merged with an actual UI Element by the bindings.

I suggest that MF could offer the option not concern itself with markup at all. Instead, let the MF callee own taking the translated strings, and converting them into the UI primitive of choice.

How can the CAT tool, validation tooling etc. work with such model? How can CAT tool support localizer being able to reorder elements in a message, add open but require close etc?

MF should organize formatted raw string content (it already does). It is an over-step, or an undersight, IMHO, for MF to assume that it should do a formattedParts.join("") at the end of the formatting process.

I do not believe MF assumes that, parts.join("") is just an option like toString() in JS. Bindings, TTS and other engines will consume parts.

eemeli commented 1 year ago

@cdaringe It may be useful for you to play around with the polyfill for the Intl.MessageFormat proposal; it's available on npm:

npm i messageformat@next

With that, you get results like this:

import { MessageFormat } from 'messageformat'

const mf = new MessageFormat('{Click {+a href=$url}here{-a} to continue}', 'en')
mf.resolveMessage({ url: 'http://example.com' })

{
  type: 'message',
  value: [
    { type: 'literal', value: 'Click ' },
    {
      type: 'markup-start',
      value: 'a',
      options: { href: 'http://example.com' }
    },
    { type: 'literal', value: 'here' },
    { type: 'markup-end', value: 'a' },
    { type: 'literal', value: ' to continue' }
  ]
}

This low-level API is intended to serve as a building block for formatting to exactly the sort of React or other non-flat-string targets that I understand you to also be working with. Crucially, that API isn't actually defined by the MF2 spec, but by the Intl.MessageFormat spec, much like the ICU libraries will define their interfaces separately from the MF2 language spec.

Given that, the question I'd like to pose to you is this: Are your concerns related to the shape of MF2 messages, or the APIs for formatting them?

cdaringe commented 1 year ago

Hey @eemeli! As usual, great points. Thanks for helping disambiguate. Your feedback prompted me to challenge some of my assumptions.

resolveMessage is precisely the type of output i'm talking about. Thanks for sharing that snippet. In my naive perception of the world, such an output would indeed be part of the MF2 spec, vs the interface being a pure implementation detail for any given runtime, such as the Intl API.

Are your concerns related to the shape of MF2 messages, or the APIs for formatting them?

The APIs for formatting them.

In my myopic understanding of the ecosystem, I would think that this WG could specify some amount interface definitions for binding impls to satisfy. By specifying some well-known interfaces, such as resolveMessage, it could promote language or runtime impls that ensure MessageFormat works great for all end-user developers. For example, my ios friend had a very challenging time adapting his Clang MFv1 impl to support formatting/styling/layout. MFv2 markup wouldn't help him. Markup pretty much only works for web. That's not an absolutely claim (there are obviously more markup languages than HTML), but practically speaking, MFv2 now has a subjectively "web only" feature in it. I think that's a mistake. We're giving a formatting API to web devs who use direct HTML explicitly via markup, but we're not giving any similar API to any other type of user. The iOS dev had to hack his MF impl deeply to de-stringify it. What a delight it would have been to say "the MFv1 API output yields structured data of known form (see resolveMessage output). Because of this, you can easily adapt it to your runtime".

practically speaking, MFv2 now has a subjectively "web only" feature in it. I think that's a mistake.

I think it's fair to posit that defining implementation interfaces is not MF's role. MF could define the input only, and any given engine could define how to parse and produce output freely. I do, however, think it would be beneficial to specify at least some subjective amount of API contract. I believe cases just like this are actually quite easy to overlook. A naive implementer for any given runtime may produce a string-only output API, not a lovely resolveMessage API as you have presented. You have the experience and wisdom to author smart bindings like that. Could MF itself not promote such wise bindings as part of its specification? If messageformat-haxe-bindings came into existence tomorrow, should we not recommend resolveMessage, or give the haxe implementer a suite of golden cases to test compliance against?

I'm thinking out loud. Thanks for reading this far.

I was looking at the goals page. Goals 4 & 6 both somewhat suggest this WG could promote such an interface, but goal 6 also kind demotes the idea too 😄 .

I can imagine something like:

# MessageFormat

Definition: A suite of specifications promoting developer-friendly translation workstreams regarding the production of user-facing text.

## Specifications

- Syntax: ...
- Required APIs:
   - (t: JSONInput) => ResolvedMessageOutput
- Recommended APIs:
   - ResolvedMessageOutput => string
   - <O>(o: ResolvedMessageOutput) => O 
- Golden data (input/output cases): ...

I'll be mulling over this more.

zbraniecki commented 1 year ago

Markup pretty much only works for web. That's not an absolutely claim (there are obviously more markup languages than HTML), but practically speaking, MFv2 now has a subjectively "web only" feature in it.

Can you please elaborate this point. The WG does not share your perspective - we actually worked hard to ensure that the markup is not HTML or Web specific and can scale to any UI concepts (including GUI and VUI models for TTS).

cdaringe commented 1 year ago

@zbraniecki, i didn't realize until this morning that #272 is more or less a dupe of this conversation. i'll take it up over there or in #356. apologies for leaking this topic between issues

stasm commented 1 year ago

@cdaringe There's also #41 with more discussion about formatting to something other than text.

cdaringe commented 1 year ago

Alright! Disregard all prior comments here :) I've read through all of the history on the matter, and created a timeline of markup related events: https://github.com/unicode-org/message-format-wg/pull/401

I think markup has both settled design and unsettled design. I'd like to make some assertions about both. Please correct my incorrect assertions 😄

Settled

MFv2 has no opinion on "markup", in the traditional definition of the word
- Users may enter markup into messages, but MFv2 makes no guarantee that your markup-lang of choice is strictly compatible. Escapes may be needed.
MFv2 has a "mf-markup" capability (my own wording). "mf-markup" is not really markup at all--it's just functions. These functions can be used to decorate text with markup, or map into any other UI primitive. Alternatively, markup can be used by a formatToParts API to apply markup.
- Aside, we may consider renaming our internal documentation/syntax/verbiage away from the term markup in order to avoid ambiguity, but that's a secondary discussion!

Unsettled

I think each of the following needs assertive answers, based on outstanding community discussions (as captured in #401):

MARKUP_FN_SIGNATURES: What will the signature for markup functions be? Will function calls be truly standard, or will extra data be present in invocations?
RUNTIME_FORMATTING_EXTENSIBILITY: Where in the spec shall we declare that MF implementations must offer a runtime extension for formatting functions? (This may already be completed 😄 )
MARKUP_SPANS: Will the opening and closing of markup syntax have any influence on the formatting function inputs? e.g., if markup is open-closed balanced, is the formatting function called with the internal contents? If not, see: FORMAT_TO_PARTS, because it would be the only other mechanism to identify spans, and becomes much more important for spec-promotion.
FORMAT_TO_PARTS: Will the formatToParts API input/output interfaces be rolled into the specification? If so, where?
MARKUP_XML: Will we support or drop any squishy desires to pair our syntax with something XML-ish? This question came and went with the tides 😄 🌊

unicode-org / message-format-wg

What's left to discuss on markup? #375

238

241

262

356

357