tloen / alpaca-lora

Instruct-tune LLaMA on consumer hardware
Apache License 2.0
18.68k stars 2.22k forks source link

running the generate.py on the base llama weights return garbage #484

Open zabir-nabil opened 1 year ago

zabir-nabil commented 1 year ago

I modified the codebase a little:

"""code for zero shot instruction parsing"""
import torch
from peft import PeftModel
import transformers
import textwrap
from transformers import AutoModel, AutoTokenizer, LlamaTokenizer, LlamaForCausalLM, GenerationConfig
from transformers.generation.utils import GreedySearchDecoderOnlyOutput

PROMPT_TEMPLATE = f""" 
### Instruction:
[INSTRUCTION]

### Input:
[INPUT]

### Response:
"""

def get_llama_7b(hf_model = "yahma/llama-7b-hf", gpu = 0):
    DEVICE = "cuda:" + str(gpu) if torch.cuda.is_available() else "cpu"
    tokenizer = AutoTokenizer.from_pretrained(hf_model)
    model = LlamaForCausalLM.from_pretrained( 
        hf_model,
        load_in_8bit=True,
        device_map="auto",
    )
    model.config.pad_token_id = tokenizer.pad_token_id = 0  # unk
    model.config.bos_token_id = 1
    model.config.eos_token_id = 2
    model = model.eval()
    model = torch.compile(model)
    return model, tokenizer

def get_alpaca_7b(hf_model = "decapoda-research/llama-7b-hf", lora_model = "tloen/alpaca-lora-7b", gpu = 0):
    DEVICE = "cuda:" + str(gpu) if torch.cuda.is_available() else "cpu"
    tokenizer = LlamaTokenizer.from_pretrained(hf_model)
    model = LlamaForCausalLM.from_pretrained(
        hf_model,
        load_in_8bit=True,
        device_map="auto",
    )
    model = PeftModel.from_pretrained(model, lora_model, torch_dtype=torch.float16)
    model.config.pad_token_id = tokenizer.pad_token_id = 0  # unk
    #model.config.bos_token_id = 1
    #model.config.eos_token_id = 2
    model = model.eval()
    model = torch.compile(model)
    return model, tokenizer

def create_prompt(instruction: str, input_text: str) -> str:
    return PROMPT_TEMPLATE.replace("[INSTRUCTION]", instruction).replace("[INPUT]", input_text)

def generate_response(prompt: str, model: PeftModel, tokenizer: LlamaTokenizer, gpu = 0, text_gen_config = {}) -> GreedySearchDecoderOnlyOutput:
    DEVICE = "cuda:" + str(gpu) if torch.cuda.is_available() else "cpu"
    encoding = tokenizer(prompt, return_tensors="pt")
    input_ids = encoding["input_ids"].to(DEVICE)

    generation_config = GenerationConfig(
        temperature=text_gen_config.get('temperature', 0.1),
        top_p=text_gen_config.get('top_p', 0.75),
        top_k=text_gen_config.get('top_k', 40),
        num_beams=text_gen_config.get('num_beams', 4),
        repetition_penalty = 1.2,
    )
    with torch.inference_mode():
        return model.generate(
            input_ids=input_ids,
            # generation_config=generation_config,
            return_dict_in_generate=True,
            output_scores=True,
            max_new_tokens=text_gen_config.get('max_new_tokens', 256)
        )

def format_response(response: GreedySearchDecoderOnlyOutput, tokenizer: LlamaTokenizer) -> str:
    decoded_output = tokenizer.decode(response.sequences[0])
    response = decoded_output.split("### Response:")[1].strip()
    print(decoded_output)
    return "\n".join(textwrap.wrap(response))

def get_response(instruction: str, input_str: str, model: PeftModel, tokenizer: LlamaTokenizer, gpu = 0, text_gen_config = {}) -> str:
    prompt = create_prompt(instruction, input_str)
    response = generate_response(prompt, model, tokenizer, gpu, text_gen_config)
    return format_response(response, tokenizer)

if __name__ == "__main__":
    instruction = "write a one line summary title for the news"
    news_body = "Live From New York! It's Jobs Friday! body: National Archives Yes, it's that time again, folks. It's the first Friday of the month, when for one ever-so-brief moment the interests of Wall Street, Washington and Main Street are all aligned on one thing: Jobs. A fresh update on the U.S. employment situation for January hits the wires at 8:30 a.m. New York time offering one of the most important snapshots on how the economy fared during the previous month. Expectations are for 203,000 new jobs to be created, according to economists polled by Dow Jones Newswires, compared to 227,000 jobs added in February. The unemployment rate is expected to hold steady at 8.3%. Here at MarketBeat HQ, we'll be offering color commentary before and after the data crosses the wires. Feel free to weigh-in yourself, via the comments section. And while you're here, why don't you sign up to follow us on Twitter. Enjoy the show. ||||| Employers pulled back sharply on hiring last month, a reminder that the U.S. economy may not be growing fast enough to sustain robust job growth. The unemployment rate dipped, but mostly because more Americans stopped looking for work. The Labor Department says the economy added 120,000 jobs in March, down from more than 200,000 in each of the previous three months. The unemployment rate fell to 8.2 percent, the lowest since January 2009. The rate dropped because fewer people searched for jobs. The official unemployment tally only includes those seeking work. The economy has added 858,000 jobs since December _ the best four months of hiring in two years. But Federal Reserve Chairman Ben Bernanke has cautioned that the current hiring pace is unlikely to continue without more consumer spending."
    # model, tokenizer = get_alpaca_7b(lora_model="/data2/llama/tvw1")
    # alpaca_response = get_response(instruction, news_body, model, tokenizer)
    # print("Alpaca response: " + alpaca_response)
    # del model, tokenizer
    model, tokenizer = get_llama_7b()
    llama_response = get_response(instruction, news_body, model, tokenizer)
    print("Llama response: " + llama_response)

It always returns garbage output, either the input itself or repeatedly "###Output"

zabir-nabil commented 1 year ago

I also tried this:

import torch
from transformers import LlamaTokenizer, LlamaForCausalLM

model_path = 'yahma/llama-7b-hf' # 'openlm-research/open_llama_3b_600bt_preview'
# model_path = 'openlm-research/open_llama_7b_700bt_preview'

tokenizer = LlamaTokenizer.from_pretrained(model_path)
model = LlamaForCausalLM.from_pretrained(
    model_path, torch_dtype=torch.float16, device_map='auto',
)

prompt = '### Instruction: Summarize the news in one sentence\n ### Input: ' + "National Archives Yes, it's that time again, folks. It's the first Friday of the month, when for one ever-so-brief moment the interests of Wall Street, Washington and Main Street are all aligned on one thing: Jobs. A fresh update on the U.S. employment situation for January hits the wires at 8:30 a.m. New York time offering one of the most important snapshots on how the economy fared during the previous month. Expectations are for 203,000 new jobs to be created, according to economists polled by Dow Jones Newswires, compared to 227,000 jobs added in February. The unemployment rate is expected to hold steady at 8.3%. Here at MarketBeat HQ, we'll be offering color commentary before and after the data crosses the wires. Feel free to weigh-in yourself, via the comments section. And while you're here, why don't you sign up to follow us on Twitter. Enjoy the show. ||||| Employers pulled back sharply on hiring last month, a reminder that the U.S. economy may not be growing fast enough to sustain robust job growth. The unemployment rate dipped, but mostly because more Americans stopped looking for work. The Labor Department says the economy added 120,000 jobs in March, down from more than 200,000 in each of the previous three months. The unemployment rate fell to 8.2 percent, the lowest since January 2009. The rate dropped because fewer people searched for jobs. The official unemployment tally only includes those seeking work. The economy has added 858,000 jobs since December _ the best four months of hiring in two years. But Federal Reserve Chairman Ben Bernanke has cautioned that the current hiring pace is unlikely to continue without more consumer spending. \n ### Output:"
input_ids = tokenizer(prompt, return_tensors="pt").input_ids

generation_output = model.generate(
    input_ids=input_ids, max_new_tokens=256
)
print(tokenizer.decode(generation_output[0]))

It's slightly better. It does produces some summary even though not in one line.

<s>### Instruction: Summarize the news in one sentence
 ### Input: National Archives Yes, it's that time again, folks. It's the first Friday of the month, when for one ever-so-brief moment the interests of Wall Street, Washington and Main Street are all aligned on one thing: Jobs. A fresh update on the U.S. employment situation for January hits the wires at 8:30 a.m. New York time offering one of the most important snapshots on how the economy fared during the previous month. Expectations are for 203,000 new jobs to be created, according to economists polled by Dow Jones Newswires, compared to 227,000 jobs added in February. The unemployment rate is expected to hold steady at 8.3%. Here at MarketBeat HQ, we'll be offering color commentary before and after the data crosses the wires. Feel free to weigh-in yourself, via the comments section. And while you're here, why don't you sign up to follow us on Twitter. Enjoy the show. ||||| Employers pulled back sharply on hiring last month, a reminder that the U.S. economy may not be growing fast enough to sustain robust job growth. The unemployment rate dipped, but mostly because more Americans stopped looking for work. The Labor Department says the economy added 120,000 jobs in March, down from more than 200,000 in each of the previous three months. The unemployment rate fell to 8.2 percent, the lowest since January 2009. The rate dropped because fewer people searched for jobs. The official unemployment tally only includes those seeking work. The economy has added 858,000 jobs since December _ the best four months of hiring in two years. But Federal Reserve Chairman Ben Bernanke has cautioned that the current hiring pace is unlikely to continue without more consumer spending.
 ### Output: National Archives ||||| Employers pulled back sharply on hiring last month, a reminder that the U.S. economy may not be growing fast enough to sustain robust job growth. The unemployment rate dipped, but mostly because more Americans stopped looking for work. The Labor Department says the economy added 120,000 jobs in March, down from more than 200,000 in each of the previous three months. The unemployment rate fell to 8.2 percent, the lowest since January 2009. The rate dropped because fewer people searched for jobs. The official unemployment tally only includes those seeking work. The economy has added 858,000 jobs since December _ the best four months of hiring in two years. But Federal Reserve Chairman Ben Bernanke has cautioned that the current hiring pace is unlikely to continue without more consumer spending.
 ### Output: National Archives ||||| Employers pulled back sharply on hiring last month, a reminder that the U.S. economy may not be growing fast enough to sustain robust job growth. The unemployment rate dipped, but mostly
22zhangqian commented 1 year ago

me too, there always appear some "### Instruction:". and need a long time to generate answer, and, if i ask it "tell me about alpaca", it will give me answer, but when i ask other question, it will can't give me a resemble answer.idon't know why

jackou2077 commented 1 year ago

Did you fix this bug?

Frankie-Dejong commented 1 year ago

Yeah I meet the problem too. Did you fix it?

lollipopmark commented 3 months ago

Yeah I meet the same problem.