unicode-org / message-format-wg

Developing a standard for localizable message strings
Other
228 stars 33 forks source link

CLDR semantic datetime skeleton spec is nearly ready and MF2 should use it #866

Open sffc opened 3 weeks ago

sffc commented 3 weeks ago

The default registry still lists "field options" as being valid configuration for MF2.

https://github.com/unicode-org/message-format-wg/blob/main/exploration/default-registry-and-mf1-compatibility.md#field-options

As an implementer, I am troubled by this requirement creeping into MF2. It is well established that "field options" are filled with footguns and edge cases. Old-style datetime skeletons not only require that implementations like ICU4X ship larger code and data, they often do not encourage i18n best practices.

I have been working on a specification to solve these issues called "semantic skeleta". We have been discussing this almost every week in the CLDR Design WG that meets weekly on Mondays at 10am (unfortunately the same time slot as the MF WG meeting). Now that the spec draft is nearly complete, I wanted to share it here.

https://docs.google.com/document/d/1dmMk_XODm3DGe84GmMVw7O6a7oIs5yojDXm_25CcMbw/edit#heading=h.d9pp2vm43mob

This is not the first time I've raised this issue, but previously, semantic skeleta were only a design doc. I am posting this issue now to raise awareness about the upcoming technology preview in UTS 35, which I think MF2 should embrace.

aphillips commented 2 weeks ago

@sffc Thanks for the update.

Please note that the document you linked is a design document. There is specification text in the registry.md document that also uses these various options.

The existing skeleton support in ICU did not make the cut for LDML45 and isn't supported, AFAIK, in MF2 except via "option bags". I look forward to seeing more of semantic skeletons. The exact way that we integrate such support in MF2 is at a critical juncture. If possible, it would be good to avoid creating sets of options that are deprecated shortly afterwards, but required for conformance. At the same time, we might be just a little ahead of being able to offer the new stuff. I look forward to a discussion here.

sffc commented 2 weeks ago

Supporting "options bags of individual field lengths" puts the same requirements on implementations such as ICU4X. In particular, it is highly customizable, meaning that implementations need to ship the DateTimePatternGenerator and all the code and data required for it. Semantic skeleta, on the other hand, are a strict subset designed to represent classical skeletons that "actually make sense", a small enough set that implementations can pre-compute the patterns ahead of time. So, requiring "options bags of individual field lengths" directly harms implementations (currently ICU4X but likely more in the future), and by extension clients of those implementations.

(I haven't brought this to ICU4X-TC for a formal recommendation but personally I think this rises to the level of "a concern that must be resolved during tech preview")

eemeli commented 2 weeks ago

@sffc Could you share a couple of examples of what a semantic skeleton value could look like when used as an MF2 option value? It's not immediately obvious to me from the linked doc what they look like.

sffc commented 2 weeks ago

The spec defines the schema, not a specific interface, but for MessageFormat 2.0, an interface could look something like

{$someDate :datetime fieldSet=[year, month, day] length=medium}

ICU4X is planning to use all-caps identifiers for the field sets, which MF2 could also choose to adopt (if that happened, we'd probably put them into the semantic skeleton spec)

{$someDate :datetime fieldSet=YMD length=medium}

Please note with semantic skeleta that not all field sets are well-defined. If you request a field set [year, hour], that is considered a syntax error.

aphillips commented 2 weeks ago

Please note with semantic skeleta that not all field sets are well-defined. If you request a field set [year, hour], that is considered a syntax error.

Is "well-defined" a conformance term (the way we use valid and well-formed in say BCP47)?

I thought at one point there were enumerated names for the well-defined field sets, such as YearMonth etc. with the idea being that only useful ones would be defined.

sffc commented 2 weeks ago

An implementation should reject something like fieldSet=[year, hour] length=medium in order to be conformant, if that's what you're asking. That's a good call-out that I'll make sure gets into the semantic skeleta spec.

Yes, the spec lists out the field sets that are well-defined.

mihnita commented 2 weeks ago

Note that rejecting the options bag and taking in semantic skeletons means that the ICU4C and ICU4J up-to-date implementations of MF2 will have to wait until the semantic skeletons are also implemented in ICU4C and ICU4J.

I am not saying we should / should not do it.

Just saying that we would probably have to move all the option-bag behavior we have now to a namespace (draft) so that people have something to test with.

Any feedback we get in that space would not be as relevant.

And many might wait for adoption until the next release of ICU.


Also don't support option bags means that MF2 does not align with the current ECMAScript style for DateFormat.

sffc commented 2 weeks ago

Note that rejecting the options bag and taking in semantic skeletons means that the ICU4C and ICU4J up-to-date implementations of MF2 will have to wait until the semantic skeletons are also implemented in ICU4C and ICU4J.

I want to emphasize that semantic skeleta are designed to be implemented on top of a library that implements classical skeleta. In ICU4X, there are about 100 lines of code that sits between the semantic skeleton API and the classical skeleton API.

eemeli commented 2 weeks ago

Thus far, the option sets for :number and :datetime have been kept as subsets of the options available in the JS Intl formatters. Departing from that approach is something that ought to be discussed also in TC39 TG2. Is there any intent of proposing semantic skeletons for adoption in Intl.DateTimeFormat?

sffc commented 2 weeks ago

Semantic skeleta are designed to be implemented on top of classical skeleta, which includes ECMAScript-style options bags. In other words, semantic skeleta are a subset of Intl.DateTimeFormat with a facelift.

aphillips commented 4 days ago

I am tagging this as "Future" because it will not meet the cutoff for LDML46. It will still be considered prior to exiting Tech Preview, which is expected in the 2024 calendar year.

sffc commented 2 days ago

The semantic skeleton spec technical preview was just approved for CLDR 46. https://github.com/unicode-org/cldr/pull/4031

Please note the section defining how to map from a semantic skeleton to a classical skeleton.