michaeljclark / vf128

vf128 variable length floating-point
10 stars 0 forks source link

proposal to left-justify the mantissa #4

Open michaeljclark opened 2 years ago

michaeljclark commented 2 years ago

proposal to left-justify the mantissa

proposal to left-justify the mantissa making vf128 more consistent with IEEE 754 floating-point. this is a significant change that will more closely align the vf128 encoding with the IEEE 754 encoding.

mantissa justification background

the vf128 variable-length floating-point format presently uses a right-justified mantissa and explicit leading one. the format was created as an evolution of the ASN.1 Real format with the intention to create a more succinct representation that more closely maps to the IEEE 754 floating-point format. the ASN.1 Real format was modelled first, so at least initially, it seemed natural to adopt its right-justified exponent with explicit leading one convention. the primary differences with ASN.1 Real format is the addition of the float7 header byte with an external bit to compact the ASN.1 Real encoding for several inlined values (+/-0.0, +/-0.5, +/-1.0, +/-2.0, +/-NaN, +/-Infinity, ...) as well as supporting out-of-line exponent and mantissa values.

compact normals

the root of this issue is the representation of fixed point values using a succinct encoding with implicit exponent.

values in the range -0.99999.. to +0.99999.. are normal values encoded with an out-of-line mantissa and zero-length exponent and they allow a one byte saving for an implied exponent. it made sense to use this encoding for succinct representation of fixed-point values in the range -0.99999.. to +0.99999.. because the one-byte saving results in many more single-precision values that are encodable in four bytes, and double-precision values that are encodable in eight bytes.

the text currently reads:

### normal values with unary exponent

Normal values in the range -0.99999.. to +0.99999.. with a binary exponent
from e-1 to e-8 inclusive are encoded with zero in the exponent field, and the
exponent is encoded as a unary prefix of trailing zeros in the mantissa field.

the current code reads:

https://github.com/michaeljclark/vf128/blob/429d24b1dbfd86651aca1347fde8ceeb00453b6f/src/vf128.cc#L1322-L1328

mantissa realignment

this is a relatively complex issue so here is some background on the current encoding and how it came about.

normal values encoded with an out-of-line mantissa and zero-length exponent

reconstructing a left-justified mantissa with implicit leading one from a right-justified mantissa with an explicit leading one requires a count leading zeros to realign the point from the right of the least significant bit to the right of the explicit leading one.

trailing unary coded suffix work-around for right-justified normal values

this realignment is done to all out-of-line mantissa values to keep the code simple and consistent. the problem is that the realignment of the fraction based on the explicit leading one loses information about leading zeros that would otherwise be present in a fixed point fraction. for this reason, a trailing unary coded suffix was added as a work-around to recover the leading zeros count for these quasi fixed-point fractions, or more specifically normal values in the range -0.99999.. to +0.99999..

left-justified mantissa with implied leading zero or one

after analysis, it becomes evident that a mantissa with a left-justified point and implied leading zero or one is a more natural representation for IEEE 754 floating-point values using byte quantisation because when the mantissa retains its left justification, it retains the leading zero count which is otherwise lost and necessitated the explicit leading one that is used for realignment.

with a left-justified mantissa, the position of the point remains the same in its encoded form so it is no longer necessary to append a unary coded suffix to remember the leading zeros lost during realignment. although note that with a left-justified mantissa, it makes sense to use an implied leading zero for the special case of succinct coding of fixed-point fractions with a zero-length exponent i.e. values in the range -0.99999.. to +0.99999, similarly to what is done for subnormals.

this requires a special case to adjust fixed point values back to the implied leading one needed by IEEE 754 floating-point, but it simplifies the case for all other mantissa values as it is no longer necessary to count leading zeros during decoding if the exponent is present. the shift offset is based purely on the width in bytes of the mantissa.

this example gives an overview of the location of the point for a 12-bit fraction starting with a one:

12-bit fraction with right-justified mantissa and explicit leading one
    [byte 1] [byte 2]
    ____1NNN NNNNNNNN.

12-bit fraction with left-justified mantissa and implicit leading zero
    [byte 1] [byte 2]
[0].NNNNNNNN NNNN____

12-bit fraction with left-justified mantissa and implicit leading one
    [byte 1] [byte 2]
[1].NNNNNNNN NNN_____

note the renormalization of the mantissa and exponent for fractions whose first digit is not a one is not shown. the case where a fixed point compressed normal is reformated to IEEE 754 normal form with implicit leading one requires a count leading zeros and adjustment to the exponent.

proposed convention

if we had reasoned about the encoding of fixed point normals at the outset, we would have started with a left-justified mantissa. i.e. the result of this analysis is the proposal is to left justify all out-of-line mantissa values:

it would be possible to only change normal values encoded with an out-of-line mantissa and zero-length exponent to use a left-justified fixed point fraction with implied leading zero, and remove the unary coded suffix special case, as that is the use case that prompted this analysis. but changing only the zero-length exponent encoding introduces more complexity overall because some parts of the format would have a left-justified fraction point with implied leading zero, and other parts would have a right-justified fraction point with an explicit leading one. it is simpler if the justification scheme is consistent.

ultimately a left-justified mantissa with implied leading digit leads to saving one bit of information. this increases the set of single-precision values that can be encoded in four bytes. the exponent calculation code also becomes simpler because there is no special case to append the unary coded suffix to recover alignment for fixed-point values.

this is a relatively intrusive change because it requires changing exponent calculation and shifts for all encodings that use an out-of-line mantissa, although, on the whole, it seems like a worthwhile change as it makes the format a lot closer to IEEE 754 floating-point format, requiring fewer adjustments when unpacking the mantissa, potentially making it easier to implement in hardware, which is something that would be unlikely for the ASN.1 Real format.

conclusion

this issue serves as a notice of intent to change the format. it is a significant change but the format is not yet v1.0 so it is okay.

michaeljclark commented 2 years ago

to paraphrase a long story with lots of nuances: the implied exponent encoding encodes the mantissa a little like a denormal only the exponent is zero. the problem is that this scheme only works if the exponent is left justified.