turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.19k stars 234 forks source link

llama3 qaunt error #414

Closed bdambrosio closed 2 months ago

bdambrosio commented 2 months ago

cd ../../../exllamav2 export CUDA_VISIBLE_DEVICES=2 python3 convert.py -i ../models/llama3-70B-Instruct -o llama3-70B-Instruct-exl2 -cf llama3-70B-Instruct-exl2 -l 2048 -b 8.0 -hb 8 -ss 8192

turboderp commented 2 months ago

This should be fixed in the dev branch. Once I'm done quantizing (and testing) all the 70B versions I'll release v0.0.19 with the fixes.

bdambrosio commented 2 months ago

thanks! should have figured you had already spotted it!