microsoft / semantic-kernel

Integrate cutting-edge LLM technology quickly and easily into your apps
https://aka.ms/semantic-kernel
MIT License
21.55k stars 3.18k forks source link

Python: Unable to quantize huggingface models while creating service using semantic kernel. #6374

Open sadaf0714 opened 4 months ago

sadaf0714 commented 4 months ago

I want to use 4-bit quantized mistral model in huggingface with semantic kernel so that I can run it on google colab free tier. But I am not able to find a way to pass this configuration while creating the service. This is the code I am using for creating a service :
kernel = Kernel() text_service_id = 'mistralai/Mistral-7B-Instruct-v0.2' kernel.add_service( service=HuggingFaceTextCompletion( task="text-generation", service_id=text_service_id, ai_model_id=text_service_id, ) ) Please provide me with the solution so that I can pass 4 bit config using bitsandbytes or load_in_4bit=True or whatever else.

sadaf0714 commented 4 months ago

any updates??

matthewbolanos commented 4 months ago

@sadaf0714, are you able to share what type of configuration you're wanting to set? Is it just load_in_4bit=True? If possible are you able to share the link to the docs for this specific use case.

sadaf0714 commented 4 months ago

@matthewbolanos quantization_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_compute_dtype=torch.float16, bnb_4bit_quant_type="nf4", bnb_4bit_use_double_quant=True, ) model_4bit = AutoModelForCausalLM.from_pretrained( "mistralai/Mistral-7B-Instruct-v0.1", device_map="auto",quantization_config=quantization_config, ), somehow i want to pass this quantization_config while creating service in kernel so that i can run it on google colab free tier.

sadaf0714 commented 4 months ago

any updates? @matthewbolanos

alliscode commented 3 months ago

@eavanvalkenburg Would you take a look at this. Thanks.

eavanvalkenburg commented 3 months ago

@sadaf0714 You should be able to get it working by passing model_kwargs={"load_in_4bit": True} to the HugginFaceTextCompletion constructor, I'm working on making a sample for that, and I might add support for a different way as well, but let me see first. Let me know how that goes! (BTW I had to manually install bitsandbytes package to get it working)

eavanvalkenburg commented 3 months ago

@sadaf0714 I have created a sample, but need to do some work on it, have a look and see if you can do the same and get it working!