mit-han-lab / llm-awq

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
MIT License
2.35k stars 180 forks source link

[Announcement] AWQ is now supported in text-generation-inference #92

Open abhinavkulkarni opened 12 months ago

abhinavkulkarni commented 12 months ago

Hi,

Thanks to the great work of the authors of AWQ, maintainers at TGI, and the open-source community, AWQ is now supported in TGI (link).

@TheBloke has released many AWQ-quantized models on HuggingFace all of these can be run using TGI as follows:

text-generation-launcher \
--model-id TheBloke/Llama-2-7b-Chat-AWQ \
--trust-remote-code --port 8080 \
--max-input-length 3072 --max-total-tokens 4096 --max-batch-prefill-tokens 4096 \
--quantize awq

Note, that this PR uses older GEMM kernels from AWQ (commit f084f40).

CC: @tonylins, @Sakits

Thanks!

s-konnex-engine commented 11 months ago

Is it only possible to load AWQ models from command line? I tried to load on from The UI an get the error:

ImportError: DLL load failed while importing awq_inference_engine: The specified module could not be found.

erew123 commented 10 months ago

I get the same issue. I have updated multiple times over the last few week, assuming this was a bug that would be fixed. My full error is as follows:

Traceback (most recent call last):

File "C:\AI\text-generation-webui\modules\ui_model_menu.py", line 210, in load_model_wrapper
shared.model, shared.tokenizer = load_model(shared.model_name, loader)

File "C:\AI\text-generation-webui\modules\models.py", line 85, in load_model
output = load_func_map[loader](model_name)

File "C:\AI\text-generation-webui\modules\models.py", line 299, in AutoAWQ_loader
from awq import AutoAWQForCausalLM

File "C:\AI\text-generation-webui\installer_files\env\lib\site-packages\awq_init_.py", line 2, in
from awq.models.auto import AutoAWQForCausalLM

File "C:\AI\text-generation-webui\installer_files\env\lib\site-packages\awq\models_init_.py", line 1, in
from .mpt import MptAWQForCausalLM

File "C:\AI\text-generation-webui\installer_files\env\lib\site-packages\awq\models\mpt.py", line 1, in
from .base import BaseAWQForCausalLM

File "C:\AI\text-generation-webui\installer_files\env\lib\site-packages\awq\models\base.py", line 12, in
from awq.quantize.quantizer import AwqQuantizer

File "C:\AI\text-generation-webui\installer_files\env\lib\site-packages\awq\quantize\quantizer.py", line 11, in
from awq.modules.linear import WQLinear_GEMM, WQLinear_GEMV

File "C:\AI\text-generation-webui\installer_files\env\lib\site-packages\awq\modules\linear.py", line 4, in
import awq_inference_engine  # with CUDA kernels

ImportError: DLL load failed while importing awq_inference_engine: The specified module could not be found.

For what its worth, this is on a Windows 11 machine. All other models/loaders work fine.

erew123 commented 10 months ago

Ahhh...I see this issue discussing it, with comments yesterday that an update may be on the way https://github.com/oobabooga/text-generation-webui/issues/4253

EDIT - I had to fully re-install. Updating alone wouldnt resolve the issues, despite the fact that the 0.1.6 AutoAWQ was supposedly installed. A full re-stall of the Text-Gen-Web-UI and I can now load AWQ.

JaRail commented 10 months ago

EDIT - I had to fully re-install. Updating alone wouldnt resolve the issues, despite the fact that the 0.1.6 AutoAWQ was supposedly installed. A full re-stall of the Text-Gen-Web-UI and I can now load AWQ.

Fresh install also worked for me after updating failed to resolve the issue. Thanks!

rsgrewal-aws commented 7 months ago

Trying to load the TheBloke/Llama-2-13B-Chat-fp16 and the TheBloke/Llama-2-70B-AWQ and it says -- “Peft model detected.” and fails with no adaptor_config.json file . It looks like AWQ is not fully supported as yet on TGI