oneapi-src / oneDNN

oneAPI Deep Neural Network Library (oneDNN)
https://uxlfoundation.org
Apache License 2.0
3.64k stars 1.01k forks source link

[ARM] Support FP16 post-ops fusion into ACL kernels #2067

Open dmitry-gorokhov opened 3 months ago

dmitry-gorokhov commented 3 months ago

Summary

Current ACL integration prohibits first post op fusion into ACL kernel in case FP16 dst data_type. The request is to conditionally enable such behavior.

Problem statement

OneDNN post-ops fusion mechanism provides significant performance boost by skipping intermediate memory movements overheads. However in bounds of ACL such behavior is disabled for FP16 execution due to oneDNN requirements on precision of post-ops computations (should be equal to FP16). Fusion of single post op for FP16 primitives leads to multiple FP16<->FP32 datatype conversions and expensive memory access overheads. As a result separate execution of corresponding operations (via separate oneDNN primitives call) provides better performance in comparision with fusion version.

Preferred solution

Inside OpenVINO we just relaxed the condition to allow FP16 post-op fusion (with FP16 insternal compute) inside ACL integration. However that solution might not be sutable for all oneDNN users due to accuracy restrictions. Based on that the proposal is to adopt dnnl::accumulation_mode atribute as a trigger for different post-ops computational precision. As a results desired behavior in terms of balance between accuracy and performance can be choosen on oneDNN user level.

theComputeKid commented 2 months ago

It makes sense to me, do you have any patches demonstrating the scale of changes needed to adopt the attribute?

vpirogov commented 2 months ago

Related discussion in #1689.