google / gemma_pytorch

The official PyTorch implementation of Google's Gemma models
https://ai.google.dev/gemma
Apache License 2.0
5.26k stars 503 forks source link

Is it possible to load 7b-it using quantization config #48

Closed aliasneo1 closed 2 months ago

aliasneo1 commented 6 months ago

Newbie here. 7b-it model could be loaded in a low memory device via quantization config without using quant version of model using BitsAndBytes like below in huggingface's AutoModelForCausalLM .

quantization_config = BitsAndBytesConfig( load_in_4bit = True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16, bnb_4bit_use_double_quant=True, ) model = AutoModelForCausalLM.from_pretrained( "/kaggle/input/gemma/transformers/7b-it/2", device_map = "auto", trust_remote_code = True, quantization_config=quantization_config, ) Whether such type of loading is feasible in your current package?

pengchongjin commented 6 months ago

Unfortunately, the current code doesn't support reading the quantization config specified using HuggingFace format.

It would require some amount of code changes to make it work. If you are under the mood, we definitely welcome such changes, and will help you land it.