zarr-developers / zarr-specs

Zarr core protocol for storage and retrieval of N-dimensional typed arrays
https://zarr-specs.readthedocs.io/
Creative Commons Attribution 4.0 International
87 stars 28 forks source link

Floating point fill values' endianness #279

Open clbarnes opened 11 months ago

clbarnes commented 11 months ago

Following on from https://github.com/zarr-developers/zarr-specs/pull/236

IEEE754 doesn't specify an endianness for float representations - does this mean that the hex string representation of the fill value of a float dataset is dependent on the endianness of the codecs? If so, it would be much more convenient to just say that it's always of a particular endianness.

jbms commented 11 months ago

No, the hex string always has the sign bit as the most significant bit (i.e. first) and does not depend on endianness. Perhaps you can create a PR to clarify.

clbarnes commented 11 months ago

Is that an implementation detail of the C function referenced in the spec?

jbms commented 11 months ago

Is that an implementation detail of the C function referenced in the spec?

No, and actually the warning about strtod was in relation to the NaN syntax nan(1234) that I previously proposed but was rejected.

strtod accepts the "OxYYYYYYYY[.ZZZZZZ]" hex floating point syntax which has a different meaning. Unfortunately strtod does not guarantee that every distinct NaN value has a corresponding string representation so we can't rely on the strtod spec.

I intended to convey what I said in https://github.com/zarr-developers/zarr-specs/issues/279#issuecomment-1789148537 with the language "specifying the byte representation of the floating point number as an unsigned integer", where I was assuming the usual endian-agnostic representation of the floating point number as a sequence of bits, where the first (most significant) bit is the sign bit, followed by the exponent bits, followed by the mantissa bits. The NaN example also serves to clarify. Perhaps there is a better way to state it, though.

clbarnes commented 11 months ago

the usual endian-agnostic representation of the floating point number

This norm is what I was struggling to find details of, just came up with ambiguity e.g. https://stackoverflow.com/questions/2945174/floating-point-endianness

clbarnes commented 11 months ago

Writing the PR using this language

where the first (most significant) bit is the sign bit, followed by the exponent bits, followed by the mantissa bits

and had another question - different languages may default to different NaN values when using their respective NaN-creation routines. Are we taking a "NaN" fill to mean that any NaN value is valid, or are we specifying a specific NaN as implied by the example in the "0x..." point? If the former, implementations probably shouldn't ever write "NaN" (opting for the byte string instead) because they don't necessarily know the intention of other readers/writers. The alternative is to disallow specific NaNs entirely.

jbms commented 11 months ago

Writing the PR using this language

where the first (most significant) bit is the sign bit, followed by the exponent bits, followed by the mantissa bits

and had another question - different languages may default to different NaN values when using their respective NaN-creation routines. Are we taking a "NaN" fill to mean that any NaN value is valid, or are we specifying a specific NaN as implied by the example in the "0x..." point? If the former, implementations probably shouldn't ever write "NaN" (opting for the byte string instead) because they don't necessarily know the intention of other readers/writers. The alternative is to disallow specific NaNs entirely.

"NaN" means the specific value as defined in the specification:

"NaN", denoting thenot-a-number (NaN) value where the sign bit is 0 (positive), the most significant bit (MSB) of the mantissa is 1, and all other bits of the mantissa are zero.

(There is a missed space.)

jbms commented 11 months ago

Note that an IEEE 754 NaN value is indicated by any sign bit, all 1 exponent bits, and any non-zero mantissa. By specifying the sign and mantissa we fully specify the value.