nnstreamer / nntrainer

NNtrainer is Software Framework for Training Neural Network Models on Devices.
Apache License 2.0
135 stars 71 forks source link

[ hnrm2 ] Use precision-enhanced hnrm2 #2556

Closed skykongkong8 closed 2 months ago

skykongkong8 commented 2 months ago

Along with #2555 I inspected through every custom-made half-precision calculation function, and hnrm2 is the only one left that needs f16-f32 precision supplement.

Proposed Changes

Differnece w.r.t. fp32-cblas

dim f16 (prev) f16f32 (now)
768 282.233 0.232574

Latency

mean value ( TC = 100) Since this is nanosec-unit result, almost trivial

dim f16 f16f32 fp32-cblas
768 508 ns 579 ns 4242 ns

Better accuracy with almost no latency deterioration

Self evaluation:

  1. Build test: [X]Passed [ ]Failed [ ]Skipped
  2. Run test: [X]Passed [ ]Failed [ ]Skipped
taos-ci commented 2 months ago

:memo: TAOS-CI Version: 1.5.20200925. Thank you for submitting PR #2556. Please a submit 1commit/1PR (one commit per one PR) policy to get comments quickly from reviewers. Your PR must pass all verificiation processes of cibot before starting a review process from reviewers. If you are new member to join this project, please read manuals in documentation folder and wiki page. In order to monitor a progress status of your PR in more detail, visit http://ci.nnstreamer.ai/.