katsusan / gowiki

0 stars 0 forks source link

<CS>floating point #14

Open katsusan opened 3 years ago

katsusan commented 3 years ago

Floating point numbers are internally composed of the sign bit, the exponent field, the significand or mantissa, from left to right.

For IEEE 754 binary formats, they are usually apportioned as follows:

Type Sign Exponent Significand field Total bits Exponent bias Bits precision Number of decimal digits
Half 1 5 10 16 15 11 ~3.3
Single★ 1 8 23 32 127 24 ~7.2
Double★ 1 11 52 64 1023 53 ~15.9
x86-extended precision 1 15 64 80 16383 64 ~19.2
Quad 1 15 112 128 16383 113 ~34.0

以下图为例,

image

按照image的约定可以得出:

image

因此value = (-1)0 x 2(124-127) x (1 + 2-2x1) = 1/8 x 1.25 = 0.15625.

双精度double precision floating point则类似地按照如下计算方式:

image

=>

image

或者

image