4-bit quantization and QLoRA

jeff52415 commented 1 year ago

The current system does not support 4-bit training and inference. However, given that it could be feasibly implemented with relative ease, I am willing to assist in integrating this feature.

load_in_4bit = True
load_in_8bit = True if not load_in_4bit else False 
bnb_config = BitsAndBytesConfig(
    load_in_4bit=load_in_4bit,
    load_in_8bit=load_in_8bit,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type=“nf4”,
    bnb_4bit_compute_dtype=torch.float16,
    )
model = LlamaForCausalLM.from_pretrained(
    base_model, quantization_config=bnb_config, torch_dtype=torch.float16, device_map=device_map
    )

vihangd commented 1 year ago

@jeff52415 I managed to do something similar with my fork at https://github.com/vihangd/alpaca-qlora I am also trying to add support for more models.

kocoten1992 commented 1 year ago

Hi @vihangd, I'd like to try your fork, but why you remove export_hf_checkpoint.py ?

@jeff52415 Thanks for the PR, I'll try it.

vihangd commented 1 year ago

@kocoten1992 I am working on adding it back.. just need to ensure it works with GPT Neo X models as well..

vihangd commented 1 year ago

@kocoten1992 The fork now includes export_hf_checkpoint.py that works with any model.

tloen / alpaca-lora

4-bit quantization and QLoRA #486