Closed skykongkong8 closed 2 months ago
:octocat: cibot: Thank you for posting issue #2668. The person in charge will reply soon.
It seems introducing this function in the current NNTrainer is not appropriate, and almost every participant in the project is aware of this issue. This issue can be raised again when the proper moment to introduce such functions has come.
It seems latest OpenBlas supports bfloat16 GEMM. I guess upgrading openblas version from here will simply bring their functions to the NNTrainer (0.3.18 -> 0.3.24)
1. hardware compatibility
bfloat16
, but defined as uint16_t.2. accuracy
epsilon=1.0
even with small GEMM examples like (100,100) x (100x100) problem.3. latency measurement
4. note
Bfloat16 is more robust to inf / NaN, which can be useful to mixed precision training and fp16fp32 accumulation.