originalData schema wrong and prose not done

DavidFatDavidF commented 3 years ago

review originalData defintion in schema, data seems missing.. The originalData pattern seems wrong.


"originalData": {
  "type": "object",
  "patternProperties": { "^.*$": { "type": "string" } },
  "additionalProperties": false
},



2. develop originalData and data prose based on schema pattern consensus

3. Reference the solution from unit prose

DavidFatDavidF commented 3 years ago

@genivia-inc before I can fix the prose, the schema for original data needs to be fixed. I am afraid that we will have to discuss this first in the May meeting before we will be able to agree a solution. Nevertheless, here is soem guidance what the originalData schema needs to do. originalData is a unit level container for non-translatable strings significant for the original format or its roundtrip. It's a plain wrapper for data lements that have a required NMTOKEN id, so that the original data can be referenced from inline objects such as ph and sc.

genivia-inc commented 3 years ago

Sure. A resolution to this issue can be discussed at the next meeting or anytime here with the TC members.

For the record, here is my email from Oct 12 2016 to the xliff-omos list:

1.The UML diagram appears to lack the originalData table that was included earlier and that is also in the XLIFF XML schema. Is this correct? I wonder how this affects XML to/from JSON conversion tools, because the data content has to be translated in addition to the structural content (XML to/from JSON). This means a semantic translation, not just a syntactic translation back/forth JSON and XML. It would be simpler to keep originalData and startRef/dataRef. Just my 2c on this one. [...] Below is an initial JSON schema of JLIFF followed by an example. This assumes originalData is still a table. It is almost complete, except for adjustments related to the above, possible improvements, and perhaps rules such as oneOf/anyOf etc to describe and restrict the "element" content that combines all attributes of ec, em, pc, ph, sc, and sm.

Sorry for spamming you with this large email. The github repo is not yet up as David had indicated. [...]
[...]
   "originalData": {
     "type": "array",
     "items": {
       "type": "object",
       "properties": {
         "id": { "type": "string" },
         "data": { "type" : "string" }
       },
       "required": [ "id", "data" ]
     }
   },
[...]
An example, similar to the JSON wiki example:
{
   "id": "fl",
   "version": "1.0",
   "unit": [
       {
      "id": "u1",
      "originalData": [
      { "id": "d1", "data": "[C1/]" },
      { "id": "d2", "data": "[C2]" },
      { "id": "d3", "data": "[/C2]" }
      ],
      "subunit": [
      {
          "segment": true,
          "state": "translated",
          "canResegment": false,
          "source": [
          { "id": "c1", "kind": "ph", "dataRef": "d1" },
          { "text": "aaa" },
          { "id": "c2", "kind": "pc", "startRef": "d2", "dataRef": "d3", "text": "text" }
          ],
          "target": [
          { "id": "c1", "kind": "ph", "dataRef": "d1" },
          { "text": "AAA" },
          { "id": "c2", "kind": "pc", "startRef": "d2", "dataRef": "d3", "text": "TEXT" }
          ]
      },
      {
          "segment": false,
          "source": [
          { "text": ".  " }
          ]
      }
      ]
  }
   ]
}

This must have been changed to the current key-string table form of object (patternProperties) by committee consensus and committed to this repo once this repo was set up.

genivia-inc commented 3 years ago

The proposed resolution is the following, please comments:

    "originalData": {
      "description": "A collection of key-value pairs with NMTOKEN id addressible from inline and string values",
      "type": "object",
      "patternProperties": { "^[-._:A-Za-z0-9]+$": { "type": "string" } },
      "additionalProperties": false
    },

DavidFatDavidF commented 3 years ago

dF to implement the agreed solution in prose by June review

genivia-inc commented 3 years ago

The proposed schema change is committed.

https://github.com/oasis-tcs/xliff-omos-jliff/commit/d5af3e7dbafb2a3784e9df78244bc7d00ce63ce2#diff-3c525cc019d712b6a8511e16483ef69ef6676993fc0e4ba3d9cb6ac35e90c829

DavidFatDavidF commented 3 years ago

Looking at this in the schema

"originalData": {
      "description": "A collection of key-value pairs with NMTOKEN id addressible from inline and string values",
      "type": "object",
      "patternProperties": { "^[-._:A-Za-z0-9]+$": { "type": "string" } },
      "additionalProperties": false

I think this is still wrong, IMHO it should be "type": "array" with the indicated pattern for each of "items" and "minItems": 1

originalData should be a unit level collection/array of individually addressable original data strings..

DavidFatDavidF commented 3 years ago

the vocabulary object solution was intended, a corresponding originalDataDir vocabulary object was created to address directionality @genivia-inc to implement originalDataDir in the schema

genivia-inc commented 3 years ago

The JLIFF 2.0 and 2.1 schemas are updated with the addition of an originalDataDir object.

DavidFatDavidF commented 3 years ago

Implemented in prose, to be built and committed today..

DavidFatDavidF commented 3 years ago

close with #47

oasis-tcs / xliff-omos-jliff

originalData schema wrong and prose not done #42