Can we summarize the meanings of data type like bf16_fp16?, for example, what's activation data type and output data type, what's the computing instruction? #414
Sorry, our docs are still in WIP due to ongoing code refactoring.
The mixed data type such as bf16_fp16 and bf16_int8 refers to the usage of BF16 format during the 1st token, while fp16 or int8 type is used during the next token. This is because the 1st token is compute-intensive and highly sensitive to precision, hence we use half precision along with AMX to accelerate computation. However, next token is memory-bound, so lower precision is employed to speed up the process. For the bf16_fp16 type, introduced this type since fp16 performance is better than bf16 in some cases in older versions, but now after optimization, it is recommended to use bf16 instead of bf16_fp16.
Sorry, our docs are still in WIP due to ongoing code refactoring.
The mixed data type such as
bf16_fp16
andbf16_int8
refers to the usage ofBF16
format during the 1st token, whilefp16
orint8
type is used during the next token. This is because the 1st token is compute-intensive and highly sensitive to precision, hence we use half precision along with AMX to accelerate computation. However, next token is memory-bound, so lower precision is employed to speed up the process. For thebf16_fp16
type, introduced this type sincefp16
performance is better thanbf16
in some cases in older versions, but now after optimization, it is recommended to usebf16
instead ofbf16_fp16
.