nnstreamer / nntrainer

NNtrainer is Software Framework for Training Neural Network Models on Devices.
Apache License 2.0
144 stars 73 forks source link

[ hgemm ] hgemm noTrans with kernel 8x16 #2541

Closed skykongkong8 closed 6 months ago

skykongkong8 commented 6 months ago

This commit proposes a 8x16 kernel for Half-precision GEMM Note that this is not an '100%' optimized version of HGEMM, but still better than before. Following is unittest output with f16-f32 partial accumulated HGEMM. Fine accuracy with better latency.

mean latency ( TC = 20 )

GEMM dimension fp32 (cblas) prev 8x8 8x16
4096 square 2087 ms 7172 ms ... 1964 ms
2048 square 260 ms 413 ms ... 250 ms
1024 square 34 ms 52 ms ... 30 ms
768 square 13 ms 18 ms ... 11 ms
256X1440X256 2869 mcrs 3807 mcrs ... 2544 mcrs
256X256X1440 2929 mcrs 3950 mcrs ... 2467 mcrs
8X1440X8 5 mcrs 5 mcrs ... 10 mcrs
8X8X1440 5 mcrs 4 mcrs ... 8 mcrs
taos-ci commented 6 months ago

:memo: TAOS-CI Version: 1.5.20200925. Thank you for submitting PR #2541. Please a submit 1commit/1PR (one commit per one PR) policy to get comments quickly from reviewers. Your PR must pass all verificiation processes of cibot before starting a review process from reviewers. If you are new member to join this project, please read manuals in documentation folder and wiki page. In order to monitor a progress status of your PR in more detail, visit http://ci.nnstreamer.ai/.