ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
35.2k stars 2.57k forks source link

BFLOAT16 support #3148

Open daurnimator opened 5 years ago

daurnimator commented 5 years ago

BFLOAT16 is a new floating-point format. It's a 16-bit floating point format with an 8 bit exponent and 7 bit mantissa (vs 5 bit exponent, 11 bit mantissa of a half-precision float which is currently f16) designed for deep learning.

The bfloat16 format is utilized in upcoming Intel AI processors, such as Nervana NNP-L1000, Xeon processors, and Intel FPGAs, Google Cloud TPUs, and TensorFlow. Arm Neon and SVE also supports bfloat16 format.

Selected excerpts:

References:


As a more general issue: how should we add new numeric types going forward? e.g. Unum. With zig not supporting operator overloading, such types would have to be provided by the core for ergonomic use.

msingle commented 4 years ago

Also .NET 5 will have Half types

marnix commented 4 years ago

As a type naming proposal, perhaps f16_7, so use the mantissa/fraction number of bits? Rationale: Less precision -> lower number.

Short name Long name Description
f16 f16_10 IEEE half-precision 16-bit float / .NET Half type
f32 f32_23 IEEE 754 single-precision 32-bit float
f64 f64_52 IEEE 754 double-precision 64-bit float
(none?) f16_7 bfloat16
? f19_10 NVidia's TensorFloat
? f24_16 AMD's fp24 format
tgschultz commented 4 years ago

We could do what we do with integer types and allow the creation of arbitrary exponent/mantissa bitcount float types on demand.

daurnimator commented 4 years ago

Apparently ARM Neoverse v1 will be getting BFLOAT16 support: https://fuse.wikichip.org/news/4564/arm-updates-its-neoverse-roadmap-new-bfloat16-sve-support/

Mouvedia commented 4 years ago

If you do, also add BFLOAT19 AKA TF32. If we are following rust naming convention that would be f19b.

zigazeljko commented 4 years ago

LLVM 11 added support for bfloat16: https://llvm.org/docs/LangRef.html#floating-point-types