Open sjoerdmeijer opened 3 months ago
I think the cpu_float2half_rn
function is a reference implementation that intentionally implement the algorithm manually. Currently we rely on the compiler to do the optimized CPU float conversion (see line 222 and 232) if the compiler has fp16 data type extension and the CPU supports native fp16 conversion.
I see several floating-point conversion routines, for example this float32 to float16 helper function:
https://github.com/pytorch/FBGEMM/blob/3070f88d0dce506f2cba7f2019ea8dfc491e5c3b/include/fbgemm/Types.h#L77
But most modern AArch64 CPUs (Armv8.2a and up) and I believe x86 too have native support for FP16, and have different instructions for up and down converts. I believe that whole function can be replaced with just one FCVT instruction. The different rounding modes should be supported too.