iris-edu / mseed3-evaluation

A repository for technical evaluation and implementation of potential next generation miniSEED formats
3 stars 1 forks source link

Legacy Data Encodings #6

Closed krischer closed 7 years ago

krischer commented 7 years ago

Discussion branched off #2. Concerns DRAFT20170622.

@krischer

Why keep the legacy codes?

@andres-h

It would be nice to have a possibility to include MS2 data without modification. We should IMO get rid of the byteorder bit, though, which is ambiguous and has caused so much pain in MS2. Use a fixed byteorder in the header and different encoding types for big-endian and little-endian variant of data encodings.

@chad-iris

What @andres-h said, for forward compatibility without re-encoding. At the DMC we have converted almost all of the data in those legacy encodings to Steim# as a step towards getting rid of them, but providing the path forward is still needed.

Perhaps the wording can be stronger, instead of "not recommended", it could be "deprecated, do not use for new data".

krischer commented 7 years ago

The headers have to be rewritten in any case and I think all the encodings are lossless so there is no harm in converting the data to any of the fully supported encodings. This would make it quite a bit simpler to write libraries that fully support the new format which would be worth it I think. Also only the converter from mseed2->mseed3 would need to be aware of the legacy encodings. Little reason in my eyes to keep around the old and crufty encodings but then I don't work in a data centers so there might be other arguments I'm not aware of.

Why is converting the data to a new encoding a problem given that the headers need to be repacked in any case.

andres-h commented 7 years ago

At GEOFON we try to modify old data as little as possible. Actually I'm trying to enforce overlay filesystem where a new layer is added each year and older layers become read-only. Archived MS2 data will remain MS2 forever.

However, we will implement on-the-fly MS2-to-MS3 conversion and the converter could convert everything to Steim2 (or hopefully something better in the future). The conversion would not take much CPU power I hope.

So dropping everything besides Steim1/2, ints and floats would be OK with me.

On the other hand, the format should be extensible enough to allow adding a completely new encoding. We don't know what other communities need. And again, there is IMO no fundamental difference between encoding, blockette and chunk, eg., instead of a new encoding, one could add a new blockette or chunk.

crotwell commented 7 years ago

As long as there is a library available that easily does the conversion, I guess I am ok with dropping (or at least deprecating with extreme prejudice!). But I would NOT reuse the numbers.

chad-earthscope commented 7 years ago

Legacy encodings removed and legacy values documented as not to be used in the future in DRAFT 20170708