Open clbarnes opened 11 months ago
No, the hex string always has the sign bit as the most significant bit (i.e. first) and does not depend on endianness. Perhaps you can create a PR to clarify.
Is that an implementation detail of the C function referenced in the spec?
Is that an implementation detail of the C function referenced in the spec?
No, and actually the warning about strtod
was in relation to the NaN syntax nan(1234)
that I previously proposed but was rejected.
strtod accepts the "OxYYYYYYYY[.ZZZZZZ]" hex floating point syntax which has a different meaning. Unfortunately strtod
does not guarantee that every distinct NaN value has a corresponding string representation so we can't rely on the strtod
spec.
I intended to convey what I said in https://github.com/zarr-developers/zarr-specs/issues/279#issuecomment-1789148537 with the language "specifying the byte representation of the floating point number as an unsigned integer", where I was assuming the usual endian-agnostic representation of the floating point number as a sequence of bits, where the first (most significant) bit is the sign bit, followed by the exponent bits, followed by the mantissa bits. The NaN example also serves to clarify. Perhaps there is a better way to state it, though.
the usual endian-agnostic representation of the floating point number
This norm is what I was struggling to find details of, just came up with ambiguity e.g. https://stackoverflow.com/questions/2945174/floating-point-endianness
Writing the PR using this language
where the first (most significant) bit is the sign bit, followed by the exponent bits, followed by the mantissa bits
and had another question - different languages may default to different NaN values when using their respective NaN-creation routines. Are we taking a "NaN"
fill to mean that any NaN value is valid, or are we specifying a specific NaN as implied by the example in the "0x..."
point? If the former, implementations probably shouldn't ever write "NaN"
(opting for the byte string instead) because they don't necessarily know the intention of other readers/writers. The alternative is to disallow specific NaNs entirely.
Writing the PR using this language
where the first (most significant) bit is the sign bit, followed by the exponent bits, followed by the mantissa bits
and had another question - different languages may default to different NaN values when using their respective NaN-creation routines. Are we taking a
"NaN"
fill to mean that any NaN value is valid, or are we specifying a specific NaN as implied by the example in the"0x..."
point? If the former, implementations probably shouldn't ever write"NaN"
(opting for the byte string instead) because they don't necessarily know the intention of other readers/writers. The alternative is to disallow specific NaNs entirely.
"NaN" means the specific value as defined in the specification:
"NaN", denoting thenot-a-number (NaN) value where the sign bit is 0 (positive), the most significant bit (MSB) of the mantissa is 1, and all other bits of the mantissa are zero.
(There is a missed space.)
Note that an IEEE 754 NaN value is indicated by any sign bit, all 1 exponent bits, and any non-zero mantissa. By specifying the sign and mantissa we fully specify the value.
Following on from https://github.com/zarr-developers/zarr-specs/pull/236
IEEE754 doesn't specify an endianness for float representations - does this mean that the hex string representation of the fill value of a float dataset is dependent on the endianness of the codecs? If so, it would be much more convenient to just say that it's always of a particular endianness.