Do markup elements map to registered functions?

unicode-org / message-format-wg

Developing a standard for localizable message strings

Other

228 stars 33 forks source link

Do markup elements map to registered functions? #262

Closed stasm closed 9 months ago

stasm commented 2 years ago

@markusicu wrote in https://github.com/unicode-org/message-format-wg/pull/230#issuecomment-1116903103:

About the markup syntax: Are these free-form, implementation-dependent tokens? Do you expect them to be HTML or TTS hints or accessibility hints? Without any distinctions, this seems like Unicode private use code points with all of their problems.

Or are the markup tokens registered functions? If so, then why not use the {:function} syntax, and maybe with a literal string before the function name (which could then be something like :html).

If we do need markup with special syntax, then it should fit better into the overall syntax (e.g., always an ASCII symbol after the opening {) and into the framework as a whole (e.g., registered entities).

zbraniecki commented 2 years ago

The question behind the question that I think we need to settle on is what is the output of the MF2.0 system.

In UI Paradigm Shift Explainer I advocate for adopting the main output to be a sequence of parts that are to be consumed by the next layer - UI bindings.

Such model can mean that the output is

[
    Element { type: "a", id: "e1o", key: "e1", role: "open", namespace: "html", href: "..."},
    StringLiteral { value: "Hello World" },
    Element { type: "a", id: "e1c", key: "e1", role: "close" },
]

or some variant of that where instead of html:a we have ssml:whisper for TTS etc., but what it means is that we are not producing a string literal "<a href='...'>Hello World</a>" that the UI bindings would then re-parse into HTML sequence.

If this paradigm shift is adoped, we then have to reason about how we operate on those Element types around the MF2.0 system. In the same explainer I showcase how such Element is a type that can be provided from UI bindings as an input in $variable, has to be passable into functions, and functions may return it to be composed into the output sequence.

If we accept that paradigm, then the question above becomes a question of whether we want to have our syntax decompose to AST like in example above or:

[
    Function { name: "element", ns: "html", arguments: { name: "a", href="...", role: "open", key: "e1" } }, 
    StringLiteral { value: "Hello World" },
    Function { name: "element", ns: "html", arguments: { name: "a", role: "close", key: "e1" } }, 
]

And then the function named element in namespace html will have a signature in the registry of:

{
  name: "element",
  arguments: {
    name: String,
    href?: String,
    role: Role::Open | Role::Close,
    key: String
  },
  output: Element,
}

In the latter case, we still need to reason about a type that this functions returns because that's the type that gets composed into the output sequence.

The pro of the "function" approach is that it generalizes the schema of Element type - MF2.0 EBNF doesn't bother with element or AST node, because it's just a good ol' function with arguments and those are specified in registry. It can also "black box" the output as unknown type that is passed to the output sequence and the UI bindings dowcast to a type they know how to reason about. The con of that approach is that MF2.0 tooling has much reduced ability to reason about those and provide aid to translators (CAT tools), and maintainers (validator tools, linters, debuggers) because they have to develop a notion "a formatting function that is actually a markup element".

One of the more advanced cases that I'd like to see us support is when an Element exists in the UI, is passed to MF2.0 as an argument, and comes back localized and in the correct position in the output. For that to work, the Element field in the implementation of the bindings will have to have a field to store the pointer/uuid to the corresponding UI widget as the Element gets passed through the MF2.0.

echeran commented 2 years ago

This issue asks about markup elements, but the prior conversation in #238 asks whether this should not already be handled by some tooling outside of MF that returns placeholders. This would keep the scope of MF simpler and decoupled from the specific syntax in a way that doesn't make assumptions that there is only one supported markup format or else require message authors to tediously add that information when it could be automated by such a tooling layer. A tooling layer outside of MF can extract translatable text and placeholders from the source document, and it can automatically add data into the placeholders if necessary indicating which markup format they came from. Of course, in the reverse direction after translation, that layer would help reconstruct the original document with translated content in all the right places.

This decoupling is important so that any tooling around messages (or an abstraction of the translation document at hand) is useful across all source document formats / markup types because not hardcoded to any assumptions of a singular format / markup.

I think maybe it would help to understand what markup elements do that couldn't be done with the tooling layer we have talked and the placeholders it produces. I think that would be the response that is missing from the previous conversation of #238, but it's important for this issue here because this issue is asking "how?" but I don't have a clear answer for "why should we?".

Regarding the paradigm of returning a list of parts that are formattable, could that be forked into a separate discussion? From the committee feedback email, their comment on the topic was:

7. Output of formatting
    a. Format to string
    b. And format to parts/structure
          i. The spec can illustrate a sample output structure, but the precise structure will probably be implementation-dependent
          ii. Accessibility/TTS is an issue

I can imagine an alternative to a list of formattable parts, which is to have a top-level "Formatted-" type that can choose to expose whatever interface of parts it provides -- it doesn't have to be a list. And in the case of a datetime, as others have discussed before, we might want to have overlapping parts -- the "time part" could itself contain an "hour", "minute", "second" part. Although that "contains" interval relationship could be a non-"contains" overlap, too, which a list won't support. So either way, that needs more alternatives considered. But I don't see it as a blocker for this issue.

zbraniecki commented 2 years ago

This decoupling is important so that any tooling around messages (or an abstraction of the translation document at hand) is useful across all source document formats

I think the spectrum we're evaluating here is exactly about such "blacboxing" of markup elements from the perspective of tooling - which makes it easy for tools to handle such black boxes, but hard/impossible to reason about them.

The counter-argument is that recognizing markup elements as a first-class types of parts of message allows tooling to reason about them. CAT tools can then assist translators, linters can catch mistakes etc.

Counter argument is that tools that want to reason about it, can leverage function registry to learn that element function with argument "name" of value "a" is a link and needs a closing function element with name "a" and role "close". That approach, which treats markup elements as a second-class citizens of MF2 feels disconnected from reality of GUI localization where markup is the top-level consideration.

Designing system for GUI localization which ignore the most fundamental GUI primitive based on an argument of "then any tooling can handle it" brings me back to the very early conversations about MF2 which you were a part of - where do we want to place our chip on the spectrum between valuing compatibility with legacy tools and designing for the needs of today and tomorrow in cases where those two do not align.

Regarding the paradigm of returning a list of parts that are formattable, could that be forked into a separate discussion?

Sure, if you think it's still debatable, please file one. It's interesting to me that when I wrote UI Paradigm Shift Explainer main feedback I got is that there's no value in writing the "Structured vs String Output of Message Formatting" section because all stakeholders agree. Maybe my hunch was right that it is not the case.

And in the case of a datetime, as others have discussed before, we might want to have overlapping parts -- the "time part" could itself contain an "hour", "minute", "second" part.

If we retain the paradigm of string as a main output, then your mental model makes sense. I advocate for shifting away from that mental model. If I return [String, Date, String] then UI toolkit can compose that into proper UI widget structure or textual output can toString the date. In such case there is no place for "overlapping" parts such as "time" and "hour" until you stringify the date, and you only stringify it after you decided you want to display it as a string (rather than as a, say, DateTime UI Widget). In that case, this date can have formatToParts that can further allow for markup annotation and styling of the parts (like "minutes in bold"). But from MF2.0 perspective, I think the core is to recognize, that the output of Last message at { $time } is not a string plus annotations, but [String, Time] that can be stringified.

markusicu commented 2 years ago

I don't want to unravel the committee resolution where the details of formatToParts() are implementation-defined, and I don't think we need to. The output could be a string, or a classic formatToParts() with formatted placeholders associated with overlapping (or a tree of) spans, or a list of literal strings and not-yet-stringified placeholders.

Also, @eemeli has made good points against generic "element" or "html" functions distinguished by literal values. Specific functions should work fine. Some functions should be generic, like "link" or "bold"/"emphasis", while some might want to be specific to certain markup, such as HTML.

I think the discussion is converging on an expressive but relatively uniform placeholder syntax subsuming markup functionality.

mihnita commented 2 years ago

I don't see "link" or "bold" or "emphasis" as functions at all. They are mostly like "literal values you need to pass to a function. Same as {(20220527T1942) :datetime}

You can do {(link) :html src=url_to_go} to produce HTML, or {(link) :markdown src=url_to_go} or even {(link) :universal src=url_to_go}

:html would produce <a src=url_to_go> in text mode, or would produce nodes when formatToParts (implementation speciffic)

And :universal might be smart enough and produce context-dependent results produce AttributedString in iOS, Spanned string in Android, and plain text in a console app.

Because we have ways to produce cross-platform applications, and there is no reason to have different resource bundles for each platform.

So html / tts / markdown / ansi_escapes are functions, not "bold" and "italic" and "link"

aphillips commented 2 years ago

You can do {(link) :html src=url_to_go} to produce HTML, or {(link) :markdown src=url_to_go} or even {(link) :universal src=url_to_go}

Do we think we can fully describe "placeable" literals with our own markup syntax? Plenty of templating languages provide their own link "tag" (in order to approximate the same behavior as @mihnita is calling out above). Even HTML isn't so simple (so many in-line tags: a, b, em, strong, bdi, span......). If we supply our own syntax and then only for a part of the "markup space", are we really solving the problem?

eemeli commented 2 years ago

So html / tts / markdown / ansi_escapes are functions, not "bold" and "italic" and "link"

I don't see how we could make this assumption. It's certainly a possible choice that an implementation or a user of an implementation may make, but this is not and should not be a requirement of MF2. It should continue to be valid for an implementation or a custom function registry to define explicit bold, italic and link formatters.

mihnita commented 2 years ago

It is not an assumption, it is a proposal. I propose we do it this way.

Let me explain how I see it, because it really isn't about syntax, it is about date model + functionality.

If it is too verbose for a comment I can move it to a doc. But let's try...

Assumptions / naming / notes

Please ignore the syntax for a bit, look at the concepts
I will call "the engine" the library or component that offers the functionality to parse MF2 syntax and also the "rendering part" (which is formatTo-String / Parts / Dom / Spanned / AttributedString / etc)
The dev using the engine don't change / recompile the engine. If I am a JS user of Intl.MessageFormat I don't change and recompile the implementation of that, and I can't force my users to use the Firefox / Chrome version I build. Same for ICU, or for future OS public APIs, in case they are developed (I hope so)
The formatTo* is platform / engine / implementation dependent. One would not expect the Android implementation MF2 to produce AttributedString of the other way around
But adding support for new (custom) functions (namespaces) must be supported, without changing the engine
The signatures of those functions would be similar across engines, but it identical.
Also, the implementation would probably be different, and platform specific (the iOS one written in Swift, the Android one in Kotlin, etc)

Similarity with other concepts that I think have more agreement: formatting literals.

We now agreed (I hope) that ...{(foo)}... means that the output is "foo", but that text is read-only for translators. And I hope we kind of agree on this kind of functionality:

...{(12345679.987) :number}... => the function registered to handle "number" gets "12345679.987" (and locale, and options, and a few other things), parses it, and the formats it as a number. That means it would respect the locale and will use the proper digits (which might even be a user preference), grouping, etc.

...{(20220623T1930) :datetime skeleton=yMMMdjm}... => the function registered to handle "datetime" gets "20220623T1930" (and locale, and options, and a few other things), parses it, and the formats it as ae date (respecting the skeleton option, user prefs, etc).

These are for convenience. On should not hard-code numbers / dates in localizable strings, because we usually translate into "French", or "Arabic", but there are tens of countries using those languages, with different formatting preferences. And it is convenience, because without it one would need to make this a proper placeholder and pass the fixed value as an argument.

These functions are in the function formatting registry, and one can add custom functions doing the same thing. The registry would describe what the input string literal (to be parsed) would look like, options, etc.

Finally, getting to markdown / GUI elements / tts / etc

I am looking at it similarly to the examples above, the "string literal functions". Ignore the syntax. And it is a proposal, not an assumption. I will describe what the developer does (usage), and what a function does (advanced user), what the engine does (implementation). Because the dev writing a custom function might be different than the one using it (one implements a phone number format, and tens / hundreds others will use it)

...{(b) :html}bold{(/b)... => bold, obvious

User:

messageText = `Expires on {(b) :html}{$exp :datetime skeleton=yMMMd}{(/b)...`
mf = MessageFormat.parse(messageText)
textToUse = mf.formatToString( { "exp" : expirationDate } )
nodeToUse = mf.formatToDom( { "exp" : expirationDate } )

There is a formatting function registered under the "html" name that might produce html text (...bold...) when formatting to string, DOM when formatToDom (available in browsers, probably), Spanned on Android, etc.

The standard registry contains and entry for "html", describing all the literals accepted (a, b, br, ...), the options they take, etc. TBD if the implementations should store the formatting functions and the markup functions in the same table. I don't care right now, implementation detail.

Now, I want to produce an output where a date is rendered in a different style, and when I click on it it opens a date picker widget. So it is a GUI element. This is a custom function.

User:

// once, register custom function(s)
const DatePicker = goog.require('goog.ui.DatePicker') //  a Closure thing
MessageFormat.registerCustomMarkdownFunction("com.google.widgets", DatePicker.builder)

// when needed
messageText = `Last review date: {(datepicker/) :com.google.widgets initialValue=$dueDate}...`
mf = MessageFormat.parse(messageText)
nodeToUse = mf.formatToDom( { "dueDate" : userDueDate } )

The engine:

function format(locale, parts, argumentMap) {
    result = ""
    foreach (part : mf.parts) {
        // iterates the parts, renders them as text/dom/etc and appends that to the output
        if part is placeholder // if part is markdown, if we decide that there is value to separate the concepts
            theFunction = markdownFunctions.get(part.functionName)
            result += theFunction.format( currentLocale, part.literal, part.options)
        }
    }
    return result
}

OK, the code should be more complicated:

for each part the engine would look at the options
see if any option.value is an argument (see initialValue=$dueDate in the date picker example)
get the argumentMap.get(option.value.name), and replace it in the option map passed to the function But didn't do it in the above code for readability (same as error handling, etc)

Note that the engine code does not need to change in order to handle "html", "tts", custom stuff.

And does not need to parse the name or the literal in the placeholder.

When we formatToX then the function(s), including the custom ones, might implement a formatToX

For example (engine code producing a DOM):

function formatToDom(locale, parts, argumentMap) {
    result = new Node()
    currentParent = result
    foreach (part : mf.parts) {
        if part is placeholder // same comment as before
            theFunction = markdownFunctions.get(part.functionName)
            node += theFunction.formatToDom(currentParent, currentLocale, part.literal, part.options)
            if (node.isOpen) {
                currentParent.appendChild(node)
                node.parent = currentParent
                currentParent = node
            } else if (node.isClose) {
                currentParent = node.parent
            } else (node.isStandalone) {
                currentPattent.appendChild(node)
            }
        }
    }
    return result
}

So the engine produces a tree without knowing anything about what each function does, and how it does it.

The Google registry would contain an entry for com.google.widgets, would document that is can take / produce datepicker, timepicker, numberpicker, and what are the options.

mihnita commented 2 years ago

Some implications (if we do things a I proposed above):

The engine should understand the concepts of open / close / standalone That is needed both for something like formatToDom, and (as I explained in the EM proposal long ago) for l10n tools to do better validation.

This means that the syntax I used ( {(/b)} to mean close) is not OK. The engines should not be in the business of parsing the literal in order to do the job. So, treat that part if the syntax as hand weaving / tbd

The engine knows about formatToX, and it is platform / implementation specific. The functions also know about formatToX

But the engine does not have to parse anything.

It shows why I proposed that "html" and "tts" are functions, and "bold", "link" are not.

Also in order to pass the building of various markdown outputs, the engine does not need to parse anything at rendering time (when format is invoked) (for example to split the a name like "html.b" or "html_b"). Because these concepts are already split by the parse function.

So we can see that these work very-very much the same as formatting functions that take a literal, except with the additional concepts of open / close (the other functions are standalone). At least in my mind.

I would be very-very interested how would a proposal like ...{+html.a}... or ...{+html_a}... works in the engine.

Who splits the html and a parts? When? Who handles the dispatching to html vs tts vs custom namespace (function in my proposal)? Who splits the + / - that indicate the open / close / standalone, and when? How what do I need to put in a registry for a custom function? How does a user of custom namespace / functions registers them at runtime? Do they have to register html_a, html_b, html_br, and so on, hundreds of them? Or I just register "html"?

I am also fine if we say "the markdown elements are the same as placeholders, with some differences":

open / close / standalone concept, and the engine is aware of these concepts
slightly different syntax
slightly different naming. We say "namespace" instead of "function".

But we need to clarify the questions above about what the engine knows and what it does. It is not a syntax problem, it is a data model problem (is "html.a" parsed to one data concept, or 2, etc)

mihnita commented 2 years ago

Apologies for the super long comments, but please try to parse them. I hope it clarifies where I am coming from, and I tried to keep it technical and un-opinionated. And tried to explain why I propose certain things and not others. Questions welcomed, if I was not clear enough.

There are very likely other ways to do it.

But I think all of them would need to separate what the engine does, what the function implementers do, what the user of those functions do, how custom functions are handled, what we add in the registry.

We need to look at the full picture, not just syntax.

aphillips commented 2 years ago

@mihnita I think your long comment makes a lot of sense. It's consistent with how I'm thinking about it. The pattern strings could be validated syntactically while allowing developers (or specific frameworks/implementations) to provide their own formatters. JS, for example, could provide an HTML binding out of the box while e.g. python or e.g. stock Java might not.

The downside of these proposals is: it is kind of weird and dirty that I have to learn both my local markup syntax and its MFv2 custom binding. If I'm writing HTML, it's more natural to write a pattern like:

Your order will arrive by <b>{$deliveryTime :time skeleton=jm}</b>

Than it is to write the curly-bracketted stuff. And it's potentially more portable to just say "this string contains markup of type X" which the translation process can handle in its own way and which the runtime formatTo process handles in its own way. Worst case scenario, the markup is just Unicode code points with no special meaning or protection...

mihnita commented 2 years ago

I am 100% sympathetic with the desire to use html syntax like and so on. I know that a lot of people will also complain.

That's why Elango and me wanted to go with a data model from a very early on. That would allow various syntaxes to be built on top of it.

The "raw html" syntax you describe would result in the same data model as "...{(b) :html}..." when parsed.

The main question is: when we talk MF2 syntax, should we make HTML "first class citizen"? Because  is HTML syntax, and implies HTML. At the detriment of other markdowns? And if yes, how do we represent those other markdowns?

For example SSML has sub: https://cloud.google.com/text-to-speech/docs/ssml#sub Which conflicts with the html sub.

So, we allow for <ssml:sub>?

A bit worse: if I want the <ssml:sub> syntax to make it to the MF2 level (when MF2 parses the string), then I would have to escape it in the original format <, which completely defeats the purpose.

I a "MF2 == data-model" world this would have been not about syntax. ECMAScript can have very HTML-like syntax (which makes total sense), and would produce a MF data model where  results in a "placeholder" or "markdown" object.

I don't know how to break this conundrum.

Should we say "sorry, html is so ubiquitous that we will make it first class citizen"? But we should still support other markdowns.

"html" would be just the "default function for markdown if you don't specify something else". Similar to ...{$expiration}... resolves to :datetime or ...{$count}... resolves to :number at runtime, when I have access to the types of the expirations / count.

And "html" would be in the "standard registry" from the get go (same as date, number, plural, and all things we inherit from MF1)

Everything else I proposed stands (about what the engine does, functions registered, etc) The only differences (that I can see) are:

syntax

"html" is grand-fathered in the standard registry as a markdown function

"html" is the default markdown function if there is no function specified (we can call then "namespaces", I don't mind)

Friction point: how do I store these in XML storage files? Both Java, .NET (.resx), Android (just from the top if my mind) store strings in XML based files. Would be nice to have a way to use this syntax in those formats without forcing escaping. Maybe xml namespaces?...

Note: I am really split about this.

It is very similar to my reaction to the C++ standard, resisting for many-many-many years any support for Unicode, because the C++ standard should be self-sufficient and not depend on any other standards. So wchar_t, but we don't force a size on it Then char8_t char16_t char32_t, so we agree to a size, but it does not have to be Unicode. If they are Unicode, then there are some macros defined. Then finally "ok, ok, char*_t types ARE Unicode!" And for 15+ years I was yelling (internally) "Unicode is the standard for text, get over it, and embrace it"

You guys tell me is we should do the same for HML and MF2...

Mihai

mihnita commented 2 years ago

Do we think we can fully describe "placeable" literals with our own markup syntax

I think so...

If we supply our own syntax and then only for a part of the "markup space", are we really solving the problem?

I think that for subsets of html one can have various options, especially if "html" is gradfathered i the standard registry".

They can register a custom function with a different name ("simpleHtml")

Or can implement (with fallbacks) the existing HTML tags. For example drop completely the images. Ignore the tag :-) Render links as (Text (http://the.site.com/article)). And so on. Remove ignore events like onClick to sanitize the result. And it is on them to document that this implementation only supports a subset. It is implementation detail.

For example in Android TextView supports a plain text and spans, which are a subset of HTML. And WebView (another widget in Android) supports full HTML (including styles, JavaScript, etc). In the same application I can use both (happens a lot). So i would register a custom function for TextView, and maybe even a "safeHtml" for WebView (or use "html", but ignore dangerous parts and document that they are ignored) Not a MF2 decision, but an implementer (and user) decision.

aphillips commented 2 years ago

I am not saying that we should have a special level of support specifically for HTML baked directly into MFv2.

I am saying what you are: there are a bunch of markup languages and differing types of templating goo in the world and developers need to use these markup systems in their strings.

Echoing your thoughts on other threads, we're not concerned here with how markup is protected from the translation process or in various serialization forms. We're only concerned right now with the in-memory representation and how MFv2's processor interacts with said representation. And a minimalist interpretation would be: MFv2 doesn't understand or process your templating language. The only part of your templating language that requires escaping would be those bits that interfere with MFv2.

Even if we provide {(b) :html x=y}...{(/b)} support, developers will still write HTML (or whatever) in their strings because writing our special syntax is a PITA. Registering a custom function that understands SSML or mustache or markdown... or HTML and renders those back appropriately then becomes a convenience. The markup on this thread seems more applicable to what the translation process needs, not what the formatter needs?

mihnita commented 2 years ago

And a minimalist interpretation would be: MFv2 doesn't understand or process your templating language. The only part of your templating language that requires escaping would be those bits that interfere with MFv2.

Ah! I think I got you now!

Sure, I thought it goes without saying. Everything that does not interfere with the MF2 "in memory syntax" would be "pass-through" It is plain text, since the MF2 does not assign any special meaning the < & the rest.

There is no "smartness" associated with it through functions.

There might still be protection from translation. In the Localization Concepts doc I've shared long ago I describe sub-filters (slide 31).

Same for "real markdown", which would be pass through.

But would need to escape conflicting characters for mustache (because {{ conflicts with the MF2 syntax).

Option two is double-filtering going through a mustache filter first, converting user in "Hello {{user}}" to a placeholder. That would have been easier if MF2 was designed in terms of data-model, not syntax.

For example the Okapi framework supports sub-filters, and they work on the data model, not on text.

So you can start with properties:

msg = Hello %s

Results in a TextUnit with the id "msg" and plain text content "Hello {user}" Then you can pass it through a html sub-filter, which will convert  and  to open-close placeholders (real POJO). Then pass it through a printf sub-filter, which will convert %s to a standalone placeholder (again, real POJO).

That is why I argued that the data model should be specified, so that devs can apply transforms on it.

Mihai

mihnita commented 2 years ago

The markup on this thread seems more applicable to what the translation process needs, not what the formatter needs?

Yes and no.

From the way I understand it with Fluent you can do this:

Original HTML

text1 text3 text4.

And in Fluent you put the equivalent of:

text1 text3 text4.

The "base DOM" (I don't know how they call it) is parsed from the original HTML. A "translated DOM" us parsed from the Fluent file. And the two DOMs are merged at runtime, picking and choosing (keep class and onClick from the base, overriding the alt) A bit more complicated, to account for tags added / deleted by translation.

So this goes to runtime, beyond translation.

Also, what they do is a pretty common thing, not new.

It is the standard processing model with XLIFF: you extract only the localizable items to XLIFF, and merge the translated result back into the original file.

Standardized in OAXAL since about 2009: https://www.oasis-open.org/committees/document.php?document_id=35735

Also described (in a simplified form) in Localization Concepts.

The only difference that I can see is that the Fluent solution does the merging part at runtime, client-side, while the oaxal / xliff workflow would generate localized html files (or other format), reflecting the base HTML.

At least in theory the DOM generated from the French HTML (in the XLIFF flow) and the one generated at runtime by merging DOMs (in the Fluent flow) would be identical.

Sorry, I (again) went "off the rails" with a long comment :-) I should stop it, it's Friday.

mihnita commented 2 years ago

Although I argued (and still do) that all if this can be done without the markup at all. Can be done with "regular placeholders", only adding the open-close concepts to placeholders (which the EM proposal did).

But that's a different issue ;-)

Unfortunately we are arguing on how to do this, and syntax, when we don't even know if we need to do it to begin with, or if should just have placeholders.

"Oh, look, a squirrel!" :-)

eemeli commented 2 years ago

Having read all of the preceding, to me perhaps the most relevant question to ask in this particular thread is related to this:

The engine knows about formatToX, and it is platform / implementation specific. The functions also know about formatToX

But the engine does not have to parse anything.

It shows why I proposed that "html" and "tts" are functions, and "bold", "link" are not.

How would or should this view be reflected in the MessageFormat 2 specification? I think I understand how and why an implementation would make such a choice, but I do not see how any of the language in the core MF2 spec would speak on this.

I would be very-very interested how would a proposal like ...{+html.a}... or ...{+html_a}... works in the engine.

To me this is quite simple, if we start by presuming that a function/placeholder exists that is able to resolve to some representation of an HTML element. In syntax, this might look like {+html tag=a}. With this given, {+html.a} can be considered as sugar for the exact same thing. One benefit of having a separate html.a function compared to a generic html is that this allows for its specific attributes to be considered, rather than the generic attributes of all HTML elements. It would also make it easier to permit only a subset of HTML elements, rather than all of them.

eemeli commented 1 year ago

I believe that this was resolved by #368, which added this: https://github.com/unicode-org/message-format-wg/blob/7c080bfba867ec16abfc705e4ef5ed299a362e2e/spec/registry.dtd#L13

mihnita commented 1 year ago

I don't think that open/close/standalone are attributes of the function, they are attributes of the placeholder.

And this conflicts with the syntax:

<!ATTLIST formatSignature position (open|close|standalone) "standalone">

What I mean by that: we already have an indicator in the syntax that a placeholder is open/close/standalone. For example in ...{b +html}bold{b -html}... the - and + tell us already the placeholder is an open / close one.

While the xml means that we can do ...{b +html position=close}..., which is weird / incorrect.

Should we add some clarification saying that position is a function registry-only concept, represented as + and -, and will never exist as an option the "bag of options"?

eemeli commented 1 year ago

I don't think that open/close/standalone are attributes of the function, they are attributes of the placeholder.

I recognise that this is a current topic of conversation that's relevant, but it should be raised as a separate issue. The previous discussion was effectively concluded a year ago, and it played a part in introducing the open/close concepts into the syntax, formatting, and registry.

stasm commented 1 year ago

There's the more recent #424 which is specifically about clarifying whether open/close/standalone are properties of functions or expressions. I think we can close this issue and continue in #424. I'm also interested in revisiting some of the markup-related discussions and I'm very close (today/tomorrow) to finishing the first draft of an exploration doc on the topic. I'll post updates in #424 as well.

© Githubissues.

Githubissues is a development platform for aggregating issues.