iris-edu / mseed3-evaluation

A repository for technical evaluation and implementation of potential next generation miniSEED formats
3 stars 1 forks source link

repeated keys in extras #17

Open crotwell opened 7 years ago

crotwell commented 7 years ago

Regardless of what format (json, protobuf, chunks, cbor,...) ends up being used, do we allow repeated keys?

To phrase as json, do we have the top level extras as an array, where each item is an object that knows its type? Or do we have extras as a map, ie an object, where each item is a value for a key that must be unique at that level?

The direct mapping from mseed2 is not clear as some things correspond to fixed header items, and hence must be unique, but others correspond to blockettes which (I think) might be repeated. For example it might be legal mseed2 to have more than one event detection blockette 200 in a record, but you can't have more than one "TimeCorrection=" because this is "the" correction in the fixed header and so must be unique.

If we choose array, we probably need to say duplicates are disallowed for some standard headers.

If we choose object, we might need to make the value for some keys be an array.

I lean towards an object with unique keys (and value as arrays in cases where we think it might be needed).

I am not sure of the others, but json allows either object or array as the top level structure.

andres-h commented 7 years ago

On Wednesday 2017-07-05 22:35, Philip Crotwell wrote:

If we choose array, we probably need to say duplicates are disallowed for some standard headers.

I vote for that.

If we choose object, we might need to make the value for some keys be an array.

I can think of cases where the order of headers/chunks/whatever is important, eg.,

CHUNK1 CHUNK2 CHUNK1

is not the same as

CHUNK1 CHUNK1 CHUNK2

Alternatively, try to express blockette 201 followed by blockette 200 in Chad's proposal. You'd have EventSignalAmplitude[2], but EventMEDPickAlgorithm[1], which is kind of silly.

crotwell commented 7 years ago

@andres-h Can you give specific example of order mattering to help clarify the issue.

chad-earthscope commented 7 years ago

Regardless of what format (json, protobuf, chunks, cbor,...) ends up being used, do we allow repeated keys?

Yes, that's needed for event detection headers, maybe timing exceptions.

If we choose array, we probably need to say duplicates are disallowed for some standard headers.

👍

chad-earthscope commented 7 years ago

I can think of cases where the order of headers/chunks/whatever is

@andres-h, care to share those cases?

Alternatively, try to express blockette 201 followed by blockette 200 in Chad's proposal. You'd have EventSignalAmplitude[2], but EventMEDPickAlgorithm[1], which is kind of silly.

At the risk of stating the obvious, I do not think anyone is suggesting to continue with what was in 20170622 draft in terms of extra headers.

andres-h commented 7 years ago

On 07/06/2017 12:55 AM, Chad Trabant wrote:

@andres-h https://github.com/andres-h, care to share those cases?

I was thinking about cases when, for example, timing exception (whatever that means -- maybe "GPS in/out of lock"?) occurs in the middle of data:

WFDATA TIMING_EXCEPTION WFDATA

But I guess that is not a sensible use case. The timing exception should include a time value to point to specific sample.

At the risk of stating the obvious, I do not think anyone is suggesting to continue with what was in 20170622 draft in terms of extra headers.

👍

crotwell commented 7 years ago

Making the question more concrete...

Top level object style yields something like: { "TQ": 98, "QI": "D" }

Top level array yields something like this: [ { "key": "TQ", "value": 98 }, { "key": "QI", "value": "D" } ]

In other words the "key" has to live inside a sub-object instead of being a natural key with a value. I can see some advantage to the array in that is it easy to append, but it also gives a unnatural structure and wasted dummy key to key the "key".

A repeated key in the object style is a little more complicated, basically uses an array as the value, so something like: { "event": [ { ...event1 stuff...}, { ...event2 stuff...} ] }

It sounds like @andres-h and @chad-iris prefer array style, but are you sure that is what you meant? The top level object feels cleaner to me.

chad-earthscope commented 7 years ago

Top level array yields something like this: [ { "key": "TQ", "value": 98 }, { "key": "QI", "value": "D" } ]

Ugh.

It sounds like @andres-h and @chad-iris prefer array style, but are you sure that is what you meant? The top level object feels cleaner to me.

Given those examples, top level as an object looks much better. I also think top level object is the most common JSON, but that may not matter.

andres-h commented 7 years ago

On 07/06/2017 06:40 PM, Chad Trabant wrote:

Top level array yields something like this:
[
{ "key": "TQ", "value": 98 },
{ "key": "QI", "value": "D" }
]

Ugh.

It sounds like @andres-h <https://github.com/andres-h> and
@chad-iris <https://github.com/chad-iris> prefer array style, but
are you sure that is what you meant? The top level object feels
cleaner to me.

Not cleaner, but the only way with JSON. AFAIK, top level array is not allowed in JSON at all.

I prefer array style, but not JSON...

Given those examples, top level as an object looks much better.

This dilemma is not applicable to chunks.

crotwell commented 7 years ago

Per RFC 4627: A JSON text is a serialized object or array. JSON-text = object / array

But the revision RFC7159 relaxed this to be any json value, don't think we want to actually go there!

A JSON text is a serialized value. Note that certain previous specifications of JSON constrained a JSON text to be an object or an array.

andres-h commented 7 years ago

On 07/06/2017 06:15 PM, Philip Crotwell wrote:

A repeated key in the object style is a little more complicated, basically uses an array as the value, so something like: { "event": [ { ...event1 stuff...}, { ...event2 stuff...} ] }

One problem with JSON is that it is difficult to add things in the processing chain and to know the length of JSON data. You need an internal representation of the data and each time you add a new blockette:

  1. Add the blockette to the internal representation.

  2. Generate JSON.

  3. Check the size of JSON data.

  4. if the size is too large:

    1. Remove blockette from the internal representation.
    2. Generate JSON.
    3. Finalize record.
    4. Initialize new internal representation and add the blockette.

With chunks or blocks, you just check if size(chunks_so_far) + size(new_chunk) > size_limit

crotwell commented 7 years ago

I don't understand what you mean by "if the size is too large", there is not a limit unless you overflow the UInt16? And we are presuming extras are small, so what is too large? And of course even in JSON you know the size of the new item, and the size of the existing items. New size is just existing + new and maybe + 1 for an extra comma.

Am I miss-understanding something?