Closed aliasneo1 closed 2 months ago
Unfortunately, the current code doesn't support reading the quantization config specified using HuggingFace format.
It would require some amount of code changes to make it work. If you are under the mood, we definitely welcome such changes, and will help you land it.
Newbie here. 7b-it model could be loaded in a low memory device via quantization config without using quant version of model using BitsAndBytes like below in huggingface's AutoModelForCausalLM .
quantization_config = BitsAndBytesConfig( load_in_4bit = True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16, bnb_4bit_use_double_quant=True, ) model = AutoModelForCausalLM.from_pretrained( "/kaggle/input/gemma/transformers/7b-it/2", device_map = "auto", trust_remote_code = True, quantization_config=quantization_config, ) Whether such type of loading is feasible in your current package?