Open kranipa opened 7 months ago
model = AutoModelForCausalLM.from_pretrained(model_path, device_map='cpu', torch_dtype=torch.float16, quantization_config=woq_config, trust_remote_code=True, _use_neural_speed=False_)
Do you want to use neural_speed? If yes, try to use neural speed = True.
Thank you for the response.
using use_neural_speed=True
save function doesnt work.
I get following error
AttributeError: 'Model' object has no attribute 'save_pretrained'
can you share an example how to save quantized model ( Model
object.) with neural_speed
It looks like load/save mismatch, can you try to use latest commit instead of g494a5712fa2 and set use_neural_speed=False?
Hi, Thank you. Saving works, however loading the saved model leads to following error
raise ValueError(
ValueError: Unknown quantization type, got rtn - supported types are: ['awq', 'bitsandbytes_4bit', 'bitsandbytes_8bit', 'gptq', 'aqlm']
following is the code snippet
import torch
from intel_extension_for_transformers.transformers import AutoModelForCausalLM, RtnConfig, GPTQConfig, AwqConfig
model_path = "meta-llama/Llama-2-7b-chat-hf" # your_pytorch_model_path_or_HF_model_name
saved_dir = "models/4_bit_llama2-rtn" # your_saved_model_dir
#model_path = "Intel/neural-chat-7b-v3-3"
#saved_dir = "models/4_bit_neural_chat_7b-v3-3-rtn"
# quant
woq_config = RtnConfig(bits=4)
model = AutoModelForCausalLM.from_pretrained(model_path,
device_map='cpu',
#torch_dtype=torch.float16,
quantization_config=woq_config,
trust_remote_code=True,
use_neural_speed=False)
# save quant model
model.save_pretrained(saved_dir)
#load quant model
loaded_model = AutoModelForCausalLM.from_pretrained(saved_dir,trust_remote_code = True)
@kranipa , This issue is caused by mismatch the version of ITREX and neural-compressor. You can use neural-compressor version 2.5.1 and try it again. ITREX 1.4 is released now, Please try it. thanks very much.
okay , thank you.
@kranipa Did you get it to run? I'm having the same problem.
@PhzCode , could you post your code and let me try to reproduce it. thanks very much.
Loading saved model runs into following error It also takes a very long time to run and save quantized models.
Tried following example.