Closed CrispStrobe closed 1 week ago
Sorry for the confusion. The AutoRoundConfig was introduced after version 0.3.0. We'll update the documentation to clarify this. In the meantime, you can install the latest version from the source
git clone https://github.com/intel/auto-round.git && cd auto-round && pip install -vvv --no-build-isolation -e ..
For version 0.3.0, we recommend using:
from auto_round.auto_quantizer import AutoHfQuantizer
In this version, the device is automatically chosen with GPU and HPU taking priority over CPU. To use the CPU on a CUDA machine, you'll need to modify the model's configuration file.
If you're working with CUDA, you'll need to install from the source to compile the kernel, as we couldn't include the it in the package for various reason
updated readme, also the introduction of format you might be interested. Additionally, we plan to release a new version next month, which will address this issue.
many thanks, perfect - and wow, that was swift!
Is the issue still there because I am still getting the issue.
Apologies for the delay in the release. In the meantime, please use the following import statement for version v0.3
from auto_round.auto_quantizer import AutoHfQuantizer
yes thanks @wenhuach21 the issue i found is While quantisation it does it with 'auto-roun' format but during inference since we don't have support for that it causes the issue. Do correct me if i am wrong ?
In quantization, the model operates in floating-point format and undergoes fake quantization to simulate the quantization behavior. After the tuning process is complete, this fake model is converted into a true int4 model that adheres to your specified format. So for real inference, we need to import that code for auto-round format or install auto_gptq for gptq format.
Please refer to the Model Inference
in README for more details.
Sure
I installed optimum and auto-gptq
but still getting
ImportError: Loading a GPTQ quantized model requires optimum (pip install optimum
) and auto-gptq library (pip install auto-gptq
)
I am on a Kaggle kernel so can't refresh
Sure I installed optimum and auto-gptq but still getting ImportError: Loading a GPTQ quantized model requires optimum (
pip install optimum
) and auto-gptq library (pip install auto-gptq
) I am on a Kaggle kernel so can't refresh
Which format are you using, auto_round or auto_gptq?
For the GPTQ format, installing Optimum and Auto-GPTQ should suffice. If Transformers still throws an exception, please check your environment; you might have multiple environments in use.
For the auto_round format, please follow our readme
The format is 'auto_gptq' i installed optimum and auto-gptq and auto-round==0.3.0 and for now i am quantising 'facebook/opt-125m' and my kaggle kernel is crashing it's 12GB of P-100 GPU and this is what i am getting
remove the second line and try again, if it's ok, it seems a bug in our code. It it's not ok, try to inference this model ybelkada/opt-125m-gptq-4bit in your env.
I did removed the second line and for my model 'facebook/opt-125m' and i got
but when i did inference with 'ybelkada/opt-125m-gptq-4bit' yes i got results
Then you export to AutoRound format model rather than AutoGPTQ format. For AutoRound format, you'll need to install it from source with CUDA support. I recommend switching to the Auto-GPTQ format, but please note that it may have accuracy issues with asymmetric quantization.
autoround.save_quantized(output_dir, format='auto_gptq', inplace=True)
Yes I already saved in 'auto-gptq' format
But still no inference
I did removed the second line and for my model 'facebook/opt-125m' and i got but when i did inference with 'ybelkada/opt-125m-gptq-4bit' yes i got results
still this issue?
That's interesting! I assume you're still exporting to the AutoRound format. I ran the following code on version 0.3.0, and it worked fine. Please check the config.json in the quantized model directory; the quant_method should be set to 'gptq' if the format is auto_gptq.
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "facebook/opt-125m"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
from auto_round import AutoRound
bits, group_size, sym = 4, 128, False
autoround = AutoRound(model, tokenizer, bits=bits, group_size=group_size, sym=sym)
autoround.quantize()
output_dir = "./tmp_autoround"
## format= 'auto_round'(default in version>0.3.0), 'auto_gptq'(default in version<=0.3.0), 'auto_awq'
autoround.save_quantized(output_dir, format='auto_gptq', inplace=True)
yes this is working can you send me the version that worked for inference of this same model. Because what is their in documentation is not working for me
this is the AutoGPTQ format, I just use the same code in readme
from transformers import AutoModelForCausalLM, AutoTokenizer
quantized_model_path = "./tmp_autoround"
model = AutoModelForCausalLM.from_pretrained(quantized_model_path,
device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(quantized_model_path)
text = "There is a girl who likes adventure,"
inputs = tokenizer(text, return_tensors="pt").to(model.device)
print(tokenizer.decode(model.generate(**inputs, max_new_tokens=50)[0]))
i dont know how its happening but i did quantized it in 'auto_gptq' format and saved the file in zip and when i upload it on another kaggle notebook 'config.json' spits out 'auto-round' format
and finally it worked
One another question @wenhuach21 when i am doing the quantisation for phi-2 using GPTQ format i am getting this
and this is my code
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "microsoft/phi-2"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
from auto_round import AutoRound
bits, group_size, sym = 4, 128, False
autoround = AutoRound(model, tokenizer, nsamples=128, iters=200, seqlen=512, batch_size=4, bits=bits, group_size=group_size, sym=sym)
autoround.quantize()
output_dir = "./sahib_autorounds_phi2"
## format= 'auto_round'(default in version>0.3.0), 'auto_gptq'(default in version<=0.3.0), 'auto_awq'
autoround.save_quantized(output_dir, format='auto_gptq', inplace=True) `
I could not reproduce this issue. May I know your transformer version? BTW, for phi-2, you'd better set sym=True due to the kernel issue of GPTQ.
One another question @wenhuach21 when i am doing the quantisation for phi-2 using GPTQ format i am getting this and this is my code
from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "microsoft/phi-2" model = AutoModelForCausalLM.from_pretrained(model_name) tokenizer = AutoTokenizer.from_pretrained(model_name) from auto_round import AutoRound bits, group_size, sym = 4, 128, False autoround = AutoRound(model, tokenizer, nsamples=128, iters=200, seqlen=512, batch_size=4, bits=bits, group_size=group_size, sym=sym) autoround.quantize() output_dir = "./sahib_autorounds_phi2" ## format= 'auto_round'(default in version>0.3.0), 'auto_gptq'(default in version<=0.3.0), 'auto_awq' autoround.save_quantized(output_dir, format='auto_gptq', inplace=True) `
... did not work for me right now, whereas previously it did. Cannot check this further at the moment, but maybe you might want to. Environment was kaggle and colab, occured after
!pip install auto-round
, seemingly resolved per: