Open eemeli opened 3 years ago
We decided on #206 that
I think we have implicit answers to the questions originally proposed by @eemeli. I do think that we have a task of defining the XLIFF mapping (self-imposed in our goals).
I don't think this can really be a blocker. As discussed last April and then noted in our Deliverables:
Note that this deliverable is "at risk" and not expected to be part of the 2023 fall release.
Let's decide what to do with XLIFF mapping in an upcoming call.
We discussed this in the 2024-01-08 call. We will work on XLIFF again in the preview period.
In today (2024-09-10)'s discussion, there was still interest in an XLIFF mapping as a potential deliverable. It won't be in LDML46, but might make the official release.
While working on implementing the transformations between the data model and XLIFF, it's become obvious that we need to specify a bit better the text of our third deliverable:
A couple of specific questions:
1. Must the data model support all XLIFF features?
It's pretty obvious that according to this, all data model features need to be reflected in XLIFF, but is the inverse true? In other words, we have to at least enable for an MF2 → XLIFF → MF2 workflow to be non-lossy, but how about XLIFF → MF2 → XLIFF? Does the data model need to support all core XLIFF features, or is it okay to drop some during the XLIFF → MF2 conversion?
One core XLIFF feature that is currently not supported by either proposed model is the split of content into
<segment>
and<ignorable>
parts. Should segmented input be retained as such, or is it ok to always re-segment messages into one segment?2. Must the mapping be canonical?
The other deliverables refer to "the canonical data model" and "the canonical syntax", but that word isn't used for the XLIFF mapping. This might seem like it's just semantics, but it matters for the edge cases. A "canonical" mapping needs to be always followed, but it can be hard to implement.
One specific place where this matters is the algorithm for merging two separate
source
andtarget
messages into a single<group>
when both of them have selectors, but the lists of selectors are different (e.g.source
depends on the variablesfoo
andbar
, whiletarget
depends only onbar
). This is problematic because the MF2 data model has the two languages' messages completely separate from each other, while XLIFF enforces a structure where the selectors are shared between the languages, and the value of each case is translated separately. It can certainly be done, but it's... hairy. Not to mention needing to also be able to reverse this merge.If the mapping were "canonical", all such edge cases must be completely covered (possibly by forbidding certain structures from being used). It would be much easier if we could agree that at least parts of the mapping aren't canonical, to allow for progress on these algorithms to continue separately from the rest of the spec work.