unicode-org / message-format-wg

Developing a standard for localizable message strings
Other
228 stars 33 forks source link

Clarifications needed on the XLIFF mapping #169

Open eemeli opened 3 years ago

eemeli commented 3 years ago

While working on implementing the transformations between the data model and XLIFF, it's become obvious that we need to specify a bit better the text of our third deliverable:

A specification for a one-to-one mapping between the data model and XLIFF.

A couple of specific questions:

1. Must the data model support all XLIFF features?

It's pretty obvious that according to this, all data model features need to be reflected in XLIFF, but is the inverse true? In other words, we have to at least enable for an MF2 → XLIFF → MF2 workflow to be non-lossy, but how about XLIFF → MF2 → XLIFF? Does the data model need to support all core XLIFF features, or is it okay to drop some during the XLIFF → MF2 conversion?

One core XLIFF feature that is currently not supported by either proposed model is the split of content into <segment> and <ignorable> parts. Should segmented input be retained as such, or is it ok to always re-segment messages into one segment?

2. Must the mapping be canonical?

The other deliverables refer to "the canonical data model" and "the canonical syntax", but that word isn't used for the XLIFF mapping. This might seem like it's just semantics, but it matters for the edge cases. A "canonical" mapping needs to be always followed, but it can be hard to implement.

One specific place where this matters is the algorithm for merging two separate source and target messages into a single <group> when both of them have selectors, but the lists of selectors are different (e.g. source depends on the variables foo and bar, while target depends only on bar). This is problematic because the MF2 data model has the two languages' messages completely separate from each other, while XLIFF enforces a structure where the selectors are shared between the languages, and the value of each case is translated separately. It can certainly be done, but it's... hairy. Not to mention needing to also be able to reverse this merge.

If the mapping were "canonical", all such edge cases must be completely covered (possibly by forbidding certain structures from being used). It would be much easier if we could agree that at least parts of the mapping aren't canonical, to allow for progress on these algorithms to continue separately from the rest of the spec work.

romulocintra commented 2 years ago

We decided on #206 that

  1. Must the data model support all XLIFF features? is NO
  2. Must the mapping be canonical? Must be posponed until we have more context
aphillips commented 1 year ago

I think we have implicit answers to the questions originally proposed by @eemeli. I do think that we have a task of defining the XLIFF mapping (self-imposed in our goals).

eemeli commented 1 year ago

I don't think this can really be a blocker. As discussed last April and then noted in our Deliverables:

Note that this deliverable is "at risk" and not expected to be part of the 2023 fall release.

aphillips commented 9 months ago

Let's decide what to do with XLIFF mapping in an upcoming call.

aphillips commented 8 months ago

We discussed this in the 2024-01-08 call. We will work on XLIFF again in the preview period.