yxuansu / OpenAlpaca

OpenAlpaca: A Fully Open-Source Instruction-Following Model Based On OpenLLaMA
Apache License 2.0
301 stars 35 forks source link

QLoRA support? #3

Open limcheekin opened 1 year ago

limcheekin commented 1 year ago

Hi there,

Thanks for sharing.

Any plan to support QLoRA? Please see the following paper for more information: https://arxiv.org/abs/2305.14314

Thanks.

jav-ed commented 1 year ago

This is something I am interested in too.

jav-ed commented 1 year ago

It seems to work, loading the 3B version without qlora requires 14GB of GPU RAM, with qlora only 3GB VRAM. You can try it for yourself:

# Install latest bitsandbytes & transformers, accelerate from source
!pip install -q -U bitsandbytes
!pip install -q -U git+https://github.com/huggingface/transformers.git
!pip install -q -U git+https://github.com/huggingface/peft.git
!pip install -q -U git+https://github.com/huggingface/accelerate.git

# Other requirements for the demo
!pip install gradio
!pip install sentencepiece

import torch
from peft import PeftModel    
from transformers import AutoModelForCausalLM, AutoTokenizer, StoppingCriteria, StoppingCriteriaList, TextIteratorStreamer, BitsAndBytesConfig

# model_name = 'openlm-research/open_llama_3b_600bt_preview'

models= {
    "open_Alpaca": "openllmplayground/openalpaca_3b_600bt_preview"
}

model_name = models["open_Alpaca"]
print(f"Starting to load the model {model_name} into memory")

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, 
                                             quantization_config=bnb_config,
                                             device_map={"":0})

# see https://github.com/openlm-research/open_llama#preview-weights-release-and-usage
tokenizer.bos_token_id, tokenizer.eos_token_id = 1,2 

# same prompt as provided in https://crfm.stanford.edu/2023/03/13/alpaca.html
instruction = r'What is an alpaca? How is it different from a llama?'
'''
instruction = r'Write an e-mail to congratulate new Standford admits and mention that you are excited about meeting all of them in person.'
instruction = r'What is the capital of Tanzania?'
instruction = r'Write a well-thought out abstract for a machine learning paper that proves that 42 is the optimal seed for training neural networks.'
'''

prompt_no_input = f'Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response:'
tokens = tokenizer.encode(prompt_no_input)

device = "cuda:0"

# note you have to add .to(device) here
tokens = torch.LongTensor(tokens).unsqueeze(0).to(device)
instance = {'input_ids': tokens,
            'top_k': 50,
            'top_p': 0.9,
            'generate_len': 128}

length = len(tokens[0])
with torch.no_grad():
    rest = model.generate(
            input_ids=tokens, 
            max_length=length+instance['generate_len'], 
            use_cache=True, 
            do_sample=True, 
            top_p=instance['top_p'], 
            top_k=instance['top_k']
        )

output = rest[0][length:]
string = tokenizer.decode(output, skip_special_tokens=True)
print(f'[!] Generation results: {string}')

Outcome:

Generation results: Alpacas are closely related to llamas. They are even part of the same family. Alpacas have soft fur and are generally smaller in size.

vihangd commented 1 year ago

Just came to know about this project. I tried to attempt something similar with https://github.com/vihangd/alpaca-qlora which ports alpaca-lora to use QLoRA and you can use it to finetune open llama models.. I am trying to make it work with other models too.

jav-ed commented 1 year ago

Very nice attempt, keep me/us updated