iris-edu / mseed3-evaluation

A repository for technical evaluation and implementation of potential next generation miniSEED formats
3 stars 1 forks source link

Byteorder Considerations #9

Closed krischer closed 7 years ago

krischer commented 7 years ago

Discussion branched off #2. Concerns DRAFT20170622.

@andres-h

We should IMO get rid of the byteorder bit, though, which is ambiguous and has caused so much pain in MS2. Use a fixed byteorder in the header and different encoding types for big-endian and little-endian variant of data encodings.

@chad-iris

Good idea @andres-h. Some people may be concerned that the fixed order is not ideal given that architectures (embedded, etc.) vary, but the clarity is probably worth it.

Got a preferred byte order for the values in a fixed order?

@andres-h

Normally I would prefer big-endian, because that is the canonical "network byte order", however, standard varint is AFAIK little-endian, so if we use varints, we should consider using little-endian everywhere.

@crotwell

I am a little confused about your byte-order discussion? Do you mean that we pick a single byte order for the header and eliminate bit 0 from field 3 (flags)? Or are you talking about separating header byte order from data byte order? I do like the idea of the encoding including the byte order where if it might not be the same as the header, so 3 is big endian 32 bit integer and 43 or something else is little endian 32 bit integer. Dealing with the bit flags separately is a pain in the rumpus. And byte order might not make sense for new compression types that might be added later, for example ascii or an encoding that itself includes byte order information.

If we pick one order, then I tend to like big endian.

@andres-h

I am a little confused about your byte-order discussion? Do you mean that we pick a single byte order for the header and eliminate bit 0 from field 3 (flags)?

Yes.

I do like the idea of the encoding including the byte order where if it might not be the same as the header, so 3 is big endian 32 bit integer and 43 or something else is little endian 32 bit integer.

Exactly.

krischer commented 7 years ago

:+1: From me on fixing the header byte-order and including the byte-order in the encoding spec! Really good idea and it simplifies a lot of things. I guess I'd vote for little endian as X86 (and ARM by default) use it but no strong preference either way.

crotwell commented 7 years ago

+1 on fixing the header byte order!

Which byte order is not as big a deal as that it is fixed. Big endian was the default for seed back when Sun was widely used I think, but x86 does seem to have won the processor battle. I don't really care.

Question, if we are forcing header to be one byte order, is it also worth forcing the data to be the same? Not having to recompress is an advantage, but would be simple to just have it all be one. Is most current data in big or little endian? Not sure how I feel, so just tossing this out as a question.

chad-earthscope commented 7 years ago

+1 on fixing the header byte order!

Which byte order is not as big a deal as that it is fixed. Big endian was the default for seed back when Sun was widely used I think, but x86 does seem to have won the processor battle. I don't really care.

Great, we are all on board with this.

Regarding which one. The arguments for big-endian: a) precedent, it is stated in the SEED manual as it's standard word order, it's the current default for miniSEED 2 writers I know of. b) Java does big-endian by default, for what that's worth.

The arguments for little-endian is simple, It won the war, hardware support is ubiquitous even for embedded processors. Hardware actually using big-endian natively is becoming rare indeed.

Maybe @djeastonca has thoughts from an equipment manufacturer perspective.

I'm leaning towards little-endian give that the largest impact will be on readers of the data, because data is read much more than it's written, and currently the vast majority of those readers (data center and end user) have little-endian hardware. I have lingering concerns with departing from the SEED 2 standard, but realistically it's probably OK.

Question, if we are forcing header to be one byte order, is it also worth forcing the data to be the same? Not having to recompress is an advantage, but would be simple to just have it all be one. Is most current data in big or little endian? Not sure how I feel, so just tossing this out as a question.

Forcing the same endian for the data payload make sense for simple encodings like integers or floats. But for more complex encodings it may be a barrier. Steim[123] difference compression are only defined for big-endian, so that'd be the immediate problem forcing us to either come up with a definition for litten-endian Steim encodings or choose big endian for the header/footer. There may be future compression we wish to adopt that has a required byte order.

I'd prefer to leave this defined by the encoding value.

If we do this we could add a statement that, when possible, the encoded data should be in the same byte order as the header.

chad-earthscope commented 7 years ago

In draft 20170708 the byte order of binary values in header blocks is fixed to little-endian and the encodings have been annotated with a byte order. Encodings can be added as needed for other byte order representations.