convert.py ending with "Killed" at lm_head layer when converting zephyr-7b

turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs

MIT License

3.45k stars 257 forks source link

Discussed in https://github.com/turboderp/exllamav2/discussions/178

^{Originally posted by **Christopheraburns** November 24, 2023} Python 3.10.12 | Ubuntu20.04 | CUDA 11.8 | NVIDIA RTX ADA 4000 (20gb) I am converting the new zephyr-7b (beta) with exLlamav2 with the following syntax: `python exllamav2/convert.py \ -i zephyr-7b-beta \ -o quant \ -c wikitext-test.parquet \ -b 5.0` It seems to quantize fine but at the lm_head layer it ends with "Killed". If I try to load the quantized model with the included test_inference.py I get the error: `ValueError: ## Could not find lm_head.* in model` Which makes sense since given where the process is killed. How can I troubleshoot this further? Thanks!

turboderp / exllamav2

convert.py ending with "Killed" at lm_head layer when converting zephyr-7b #179

Discussed in https://github.com/turboderp/exllamav2/discussions/178