Possible performance improvements to half float conversion

XMConvertHalfToFloat and XMConvertFloatToHalf both use a large number of integer ops when F16 intrinsics aren't available. It may be faster to do it with floating point operations. XMConvertHalfToFloat has a while loop for denormals, which is particularly slow.

Float-to-half conversion can use a trick: For positive numbers, (f + max(f, 2^-24)) will produce a float with an exponent at a fixed bias from the half float, and handle denormals and zero, and only needs 2 ops. (Bit-exactness in this case is sensitive to handling of the dropped mantissa bits in the denormal case though.)

Half-to-float can handle denormals (and zero) by converting the mantissa to float and multiplying it by 2^-24, which should be faster than the loop.

microsoft / DirectXMath

Possible performance improvements to half float conversion #76