Open sadaf0714 opened 4 months ago
any updates??
@sadaf0714, are you able to share what type of configuration you're wanting to set? Is it just load_in_4bit=True
? If possible are you able to share the link to the docs for this specific use case.
@matthewbolanos quantization_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_compute_dtype=torch.float16, bnb_4bit_quant_type="nf4", bnb_4bit_use_double_quant=True, ) model_4bit = AutoModelForCausalLM.from_pretrained( "mistralai/Mistral-7B-Instruct-v0.1", device_map="auto",quantization_config=quantization_config, ), somehow i want to pass this quantization_config while creating service in kernel so that i can run it on google colab free tier.
any updates? @matthewbolanos
@eavanvalkenburg Would you take a look at this. Thanks.
@sadaf0714 You should be able to get it working by passing model_kwargs={"load_in_4bit": True}
to the HugginFaceTextCompletion constructor, I'm working on making a sample for that, and I might add support for a different way as well, but let me see first. Let me know how that goes! (BTW I had to manually install bitsandbytes package to get it working)
@sadaf0714 I have created a sample, but need to do some work on it, have a look and see if you can do the same and get it working!
I want to use 4-bit quantized mistral model in huggingface with semantic kernel so that I can run it on google colab free tier. But I am not able to find a way to pass this configuration while creating the service. This is the code I am using for creating a service :
kernel = Kernel() text_service_id = 'mistralai/Mistral-7B-Instruct-v0.2' kernel.add_service( service=HuggingFaceTextCompletion( task="text-generation", service_id=text_service_id, ai_model_id=text_service_id, ) ) Please provide me with the solution so that I can pass 4 bit config using bitsandbytes or load_in_4bit=True or whatever else.