Open intelyoungway opened 2 months ago
weights: Meta-Llama-3-70B-Instruct precision: bf16_int4 (v.s. bf16) version: 1.6.0 hardware: 2S-SPR9468 (Quadrant/Flat) system: Ubuntu22.04LTS container (latest XFT image) kernel: 5.17.3 command:
bf16 precision: bash run_benchmark.sh -m llama-3-70b -d bf16 -s 2 -bs 1 -in 1024 -out 128 -i 1 bf16_int4: bash run_benchmark.sh -m llama-3-70b -d bf16_int4 -s 2 -bs 1 -in 1024 -out 128 -i 1
bf16 precision:
bash run_benchmark.sh -m llama-3-70b -d bf16 -s 2 -bs 1 -in 1024 -out 128 -i 1
bf16_int4:
bash run_benchmark.sh -m llama-3-70b -d bf16_int4 -s 2 -bs 1 -in 1024 -out 128 -i 1
issue:
on bf16 precision, output is valid: on bf16_int4 precision, output is invalid:
on bf16 precision, output is valid:
on bf16_int4 precision, output is invalid:
new quantization mechanism is under design, need some time to make the potential fix.
weights: Meta-Llama-3-70B-Instruct precision: bf16_int4 (v.s. bf16) version: 1.6.0 hardware: 2S-SPR9468 (Quadrant/Flat) system: Ubuntu22.04LTS container (latest XFT image) kernel: 5.17.3 command:
issue: