opengeospatial / ogcapi-connected-systems

Public Repository for the Connected Systems SWG
Other
7 stars 6 forks source link

Encoding of DataArray inline values in JSON #20

Open alexrobin opened 1 year ago

alexrobin commented 1 year ago

When DataArray values are provided inline, we would greatly benefit from a compact syntax using only nested array with no nested objects.

Example:

{
  "type": "DataArray",
  "label": "Measurement Table",
  "elementType": {
    "name": "measurement",
    "type": "DataRecord",
    "fields": [
      {
        "name": "time",
        "type": "Time",
        "definition": "http://www.opengis.net/def/property/OGC/0/SamplingTime",
        "referenceFrame": "http://www.opengis.net/def/trs/BIPM/0/UTC",
        "label": "Sampling Time",
        "uom": { "href": "http://www.opengis.net/def/uom/ISO-8601/0/Gregorian" }
      },
      {
        "name": "temp",
        "type": "Quantity",
        "definition": "http://mmisw.org/ont/cf/parameter/air_temperature",
        "label": "Air Temperature",
        "uom": { "code": "Cel" }
      },
      {
        "name": "press",
        "type": "Quantity",
        "definition": "http://mmisw.org/ont/cf/parameter/air_pressure_at_mean_sea_level",
        "label": "Air Pressure",
        "uom": { "code": "mbar" }
      }
    ]
  },
  "values": [
    ["2009-02-10T10:42:56Z",25.4,1020],
    ["2009-02-10T10:43:06Z",25.3,1021],
    ["2009-02-10T10:44:16Z",25.3,1020],
    ["2009-02-10T10:44:17Z",25.2,1020]
  ]
}

Each array element is a record in that case but it gets encoded inline as a nested array with 3 values.

But this is different from what the JSON encoding rules for datastream say currently... In a datastream, array elements would get encoded as objects, like this:

[
  { "time": "2009-02-10T10:42:56Z", "temp": 25.4, "press": 1020 },
  { "time": "2009-02-10T10:43:06Z", "temp": 25.3, "press": 1021 },
  { "time": "2009-02-10T10:44:16Z", "temp": 25.3, "press": 1020 },
  { "time": "2009-02-10T10:44:17Z", "temp": 25.2, "press": 1020 },
]

It's probably ok to define a compact array syntax, but only in the case where the DataArray element does not itself contain any complex nested content.

hylkevds commented 4 months ago

I was about to open a topic about the DataRecord bit and it ties into this.

How is the value of a DataRecord supposed to be encoded? The examples (1,2) don't show this. The second has values, but in-line for each field separately, and not grouped for the entire DataRecord.

My impression (that may very well be wrong) is that the DataRecord is intended for name/value pairs, so I would expect the value of a record to be an Object:

 { "time": "2009-02-10T10:42:56Z", "temp": 25.4, "press": 1020 }

But if that is the case then

  1. Why is the fields field of the DataRecord an Array, and not an object.
  2. the example for the data array is incorrect, since each item in the array should be an object like in your Datastream example.
  3. we would need a different type for the actual array encoding. Like a Vector. Or maybe loosen the Vector by making definition and referenceFrame optional?

On the other hand, if the DataRecord is not meant for name/value pairs, then we'd need a new element for that...

That the fields field is an array seems to be an XML->JSON conversion anti-pattern. Making it an Object makes it much easier for a client to navigate to the definition of a field.

alexrobin commented 4 months ago

@hylkevds This issue was actually about allowing for a different (more compact) encoding for values provided inline in a DataArray. That's why the DataArray example doesn't use objects, but it would not necessarily apply to datastream encodings.

The reason why fields is an array in DataRecord is because we need to guarantee the order of fields for certain encodings (e.g. csv, binary) and JSON does not mandate that the order of object members is maintained through serialization/deserialization (some libraries do, some don't, depending on how they store the map). I agree that using an object would look better (I actually did that first before I realized it would not work for the above reason). An alternative would be to have a sequence number in the field description but it's not super clean either...

hylkevds commented 4 months ago

That's quite a specific behaviour, to change the encoding of a DataRecord depending on if it is encoded in a DataArray or a Datastream... Currently it feels like DataRecord doesn't know what it wants to be: A compact, fixed-order encoding, or a easy to read, name-value encoding. I think it would clarify and simplify things if it made up its mind :)

CSV has headers, so a fixed order is not needed there. Which binary encoding is header-less and thus has issues here? One alternative would be to specify that for header-less encodings the fields must be sorted alphabetically. That fixes the order without adding a sequence number.

Or, one could turn it around, since the fields are in a fixed order any way, one could specify the compact encoding as the way to encode DataRecords, and do away with the name fields?

alexrobin commented 4 months ago

Yes, I think you're right. The small optimization for inline DataArray values is not worth the extra complexity. We can just use the same encoding as the one specified for datastreams.

I called it CSV, but it's really DSV without headers. Since we are just working on a minor release 2.1 of SWE Common, the goal is to keep compatibility with the version 2.0 that has DSV, flat binary and even XML encodings where field values must be ordered according to the schema.

Perhaps there is a way to make it work if we move the ordering to the encoding section. I'll think about it.

alexrobin commented 4 months ago

@hylkevds We talked about the JSON encoding of DataRecord fields during our last telecon and decided to keep the array to maintain the ordering for now.

We won't introduce this new encoding for DataArray inline values either. I will update the doc and examples accordingly an close this issue when it's done.

hylkevds commented 4 months ago

The current version works, breaking changes are best kept for the next major release :)

I just noticed the vectorAsArrays field in https://github.com/opengeospatial/ogcapi-connected-systems/blob/master/swecommon/schemas/json/encodings.json does that do the same for vectors? I've not found documentation for that field. And is the default true, or false?

alexrobin commented 4 months ago

@hylkevds The vectorAsArrays field was the same idea indeed and a left over from testing. I will remove it from the JSON schema. Thanks for spotting it.

hylkevds commented 4 months ago

So a Vector is encode the same as a DataRecord? The Vector examples don't have a value block in them.

Is there currently a component type that is encoded as a fixed-length array of values with different definitions? DataArray and Matrix have this encoding, but all values have the same definition and the lengths are not fixed.

Not really a big deal, but it would indeed be nice to have.

alexrobin commented 4 months ago

Yes, a Vector is encoded the same as a DataRecord. The only difference is that it has a reference frame and cannot have nested composite structures (i.e. a Vector is always a vector of scalars).

We don't have a component that encodes an an array of values where each value has a different definition. I think we can plan to add it in the next version.

hylkevds commented 2 months ago

I like the idea of adding a flag in the Vector and DataRecord classes, to signal to the client that the results are, if the encoding supports both forms, by default in the compact form. Encodings that only support one or the other form can overrule this flag.

Should the compact form be the default, or the verbose form? And should the default be the same for Vector and DataRecord?

Complimentary to that, an API could specify a way for a client to indicate the results should be transformed. For instance, the STA option $resultFormat=DataArray could specify that all DataArrays and Vectors are transformed to compact form. But that's something for an API to specify. That might also prove overly complex to implement depending on the architecture.