QLoRA Inference - Githubissues

pytorch / torchtune

PyTorch native finetuning library

BSD 3-Clause "New" or "Revised" License

4.37k stars 446 forks source link

Can I load QLoRA fine-tuning weights into a Hugging Face model as shown below?

model_id = "meta-llama/Meta-Llama-3-8B-Instruct"
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type='nf4'
)

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    pretrained_model_name_or_path=model_id,  
    #config=config,
    low_cpu_mem_usage=True,
    torch_dtype=torch.float16,
    device_map='auto'
)

model = PeftModel.from_pretrained(model, "qlora_finetune_folder/")

I have changed the Checkpointer to FullModelHFCheckpointer. Essentially, it is loadable & runnable, but I am curious if it reflects the same structure as qlora_llama3_8b. Thanks.

pytorch / torchtune

QLoRA Inference #1020