intel / auto-round

Advanced Quantization Algorithm for LLMs. This is official implementation of "Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs"
https://arxiv.org/abs/2309.05516
Apache License 2.0
225 stars 19 forks source link

handling transformers version compatibility in lmhead export, bugfix #130

Closed WeiweiZhang1 closed 4 months ago

wenhuach21 commented 4 months ago

2024-05-28 06:12:08 INFO autoround.py L605: Summary: quantized 225/225 in the model CUDA extension not installed. CUDA extension not installed. Traceback (most recent call last): File "/home/wenhuach/auto-round/examples/language-modeling/../../auto_round/utils.py", line 54, in getattr self.module = importlib.import_module(self.module_name) File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/importlib/init.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1050, in _gcd_import File "", line 1027, in _find_and_load File "", line 1004, in _find_and_load_unlocked ModuleNotFoundError: No module named 'transformers.quantizers.HfQuantizer'

wenhuach21 commented 4 months ago

@WeiweiZhang1 tested mistral/llama3 with/wo lmhead at 4.38, please test more. merge it If no more problems