intel / xFasterTransformer

Apache License 2.0
322 stars 56 forks source link

[output issue] found mistakes in llama-3-70b output by bf16_int4 during benchmark #413

Open intelyoungway opened 2 months ago

intelyoungway commented 2 months ago

weights: Meta-Llama-3-70B-Instruct precision: bf16_int4 (v.s. bf16) version: 1.6.0 hardware: 2S-SPR9468 (Quadrant/Flat) system: Ubuntu22.04LTS container (latest XFT image) kernel: 5.17.3 command:

bf16 precision:

bash run_benchmark.sh -m llama-3-70b -d bf16 -s 2 -bs 1 -in 1024 -out 128 -i 1

bf16_int4:

bash run_benchmark.sh -m llama-3-70b -d bf16_int4 -s 2 -bs 1 -in 1024 -out 128 -i 1

issue:

on bf16 precision, output is valid: bf16-output-is-ok

on bf16_int4 precision, output is invalid: bf16-int4-output-is-invalid

pujiang2018 commented 2 months ago

new quantization mechanism is under design, need some time to make the potential fix.