microsoft / onnxruntime-genai

Generative AI extensions for onnxruntime
MIT License
419 stars 95 forks source link

Awq model export for cpu is not supported in wins #787

Open belovedxixi opened 1 month ago

belovedxixi commented 1 month ago

Describe the bug Awq model export for cpu is not supported

To Reproduce python3 builder.py -i awq_model_dir -o output_folder -p int4 -e cpu -c cache_dir

Screenshots 屏幕截图 2024-08-13 175152

Additional context It seems like hf 's file can not support awq_config for cpu

kunal-vaishnavi commented 1 month ago

According to Hugging Face, this error is raised when torch.cuda.is_available() = False. When torch.cuda.is_available() = False, this means that torch cannot find CUDA installed on your machine.

As you mentioned, it appears that Hugging Face requires that the machine that loads an AwqConfig must have CUDA installed. Here are a couple of workaround options.

  1. If your machine supports installing CUDA, verify that CUDA is installed and that torch.cuda.is_available() = True. Then run your command again.
  2. If your machine doesn't support installing CUDA, modify transformers so that post_init() is not called to avoid raising the error. Here are some steps to do this.
    
    # 1) Uninstall existing `transformers` package
    $ pip uninstall -y transformers

2) Clone transformers repo

$ git clone https://github.com/huggingface/transformers

3) Navigate to the folder that quantization_config.py is in

$ cd transformers/src/transformers/utils/

3) Comment out self.post_init() in quantization_config.py. Here's the location:

https://github.com/huggingface/transformers/blob/9d2ab8824c2a820a0ac9f2cd884ef838ca77b481/src/transformers/utils/quantization_config.py#L814

#

This will prevent the post-init steps from running and this error should not get raised.

4) Install transformers from source

$ cd ../../../ $ pip install -e .


Then run your command again.
natke commented 2 weeks ago

@belovedxixi Did you try the above suggestions? Let us know how you went