intel / xFasterTransformer

Apache License 2.0
355 stars 61 forks source link

[Kernel] Add FP16 MHA and MLP kernels. #415

Closed changqi1 closed 4 months ago

changqi1 commented 4 months ago
# weight only FP16 (input FP32, weight FP16, output FP32)
[INFO] First token time: 148.062 ms
[INFO] Second token time: 48.3581 ms
[INFO] Final output is:
==============================================
Once upon a time, there existed a little girl who liked to have adventures. She lived in a small village surrounded by

# Full-link FP16  (input FP16, weight FP16, output FP16)
[INFO] First token time: 144.831 ms
[INFO] Second token time: 46.0737 ms
[INFO] Final output is:
==============================================
Once upon a time, there existed a little girl who liked to have adventures. She lived in a small village surrounded by