The request is to support fp16 data type in jit_uni_reorder kernel on aarch64 HW.
Problem statement
Currently only fp32 and bf16 floating point data types are supported in optimized Reoder implementation on aarch64 HW. Attempt to reorder memory with fp16 data type fallbacks on reference implementation which might times slower in comparison with jitted code.
Different FWs uses FP16 as default execution type on ARM HW. This is basically creates demand on highly optimized FP16 reorder to speedup model compilation/preparation time (mostly by optimizing Conv/Matmul weights reorder to blocked format) and inference time (most of the models are mixed precision and require multiple fp32<->fp16 and fp16<->u8/i8 conversion).
Preferred solution
Extend jit_uni_reoder kernel with fp16 data type to support fp32<->fp16 and fp16<->u8/i8 conversions.
Summary
The request is to support fp16 data type in jit_uni_reorder kernel on aarch64 HW.
Problem statement
Currently only fp32 and bf16 floating point data types are supported in optimized Reoder implementation on aarch64 HW. Attempt to reorder memory with fp16 data type fallbacks on reference implementation which might times slower in comparison with jitted code. Different FWs uses FP16 as default execution type on ARM HW. This is basically creates demand on highly optimized FP16 reorder to speedup model compilation/preparation time (mostly by optimizing Conv/Matmul weights reorder to blocked format) and inference time (most of the models are mixed precision and require multiple fp32<->fp16 and fp16<->u8/i8 conversion).
Preferred solution
Extend jit_uni_reoder kernel with fp16 data type to support fp32<->fp16 and fp16<->u8/i8 conversions.