Is it possible to load 7b-it using quantization config

google / gemma_pytorch

The official PyTorch implementation of Google's Gemma models

Apache License 2.0

5.26k stars 503 forks source link

Newbie here. 7b-it model could be loaded in a low memory device via quantization config without using quant version of model using BitsAndBytes like below in huggingface's AutoModelForCausalLM .

quantization_config = BitsAndBytesConfig( load_in_4bit = True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16, bnb_4bit_use_double_quant=True, ) model = AutoModelForCausalLM.from_pretrained( "/kaggle/input/gemma/transformers/7b-it/2", device_map = "auto", trust_remote_code = True, quantization_config=quantization_config, ) Whether such type of loading is feasible in your current package?

google / gemma_pytorch

Is it possible to load 7b-it using quantization config #48