pytorch / FBGEMM

FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
Other
1.21k stars 501 forks source link

float conversion emulation routines #2985

Open sjoerdmeijer opened 3 months ago

sjoerdmeijer commented 3 months ago

I see several floating-point conversion routines, for example this float32 to float16 helper function:

https://github.com/pytorch/FBGEMM/blob/3070f88d0dce506f2cba7f2019ea8dfc491e5c3b/include/fbgemm/Types.h#L77

But most modern AArch64 CPUs (Armv8.2a and up) and I believe x86 too have native support for FP16, and have different instructions for up and down converts. I believe that whole function can be replaced with just one FCVT instruction. The different rounding modes should be supported too.

excelle08 commented 2 months ago

I think the cpu_float2half_rn function is a reference implementation that intentionally implement the algorithm manually. Currently we rely on the compiler to do the optimized CPU float conversion (see line 222 and 232) if the compiler has fp16 data type extension and the CPU supports native fp16 conversion.