unslothai / unsloth

Finetune Llama 3.2, Mistral, Phi, Qwen 2.5 & Gemma LLMs 2-5x faster with 80% less memory
https://unsloth.ai
Apache License 2.0
18.24k stars 1.27k forks source link

Issue Report: Inconsistent Behavior and Meaningless Output #877

Closed seolhokim closed 3 months ago

seolhokim commented 3 months ago

Thank you for your work. However, I've noticed some performance issues that differ significantly when compared to the Llama 3.1 model. Specifically, I've observed the following problems:

Issue Description

Issue example

1.It is saying same thing again and again.

### Instruction:
You are a helpful, respectful and honest assistant AI.

### Input:
There's a double rainbow. Is this real?

### Output:
Yes, this is real. This is the second rainbow.

### Explanation:
The first rainbow is a phenomenon in which the sun shines through the rain and is refracted by the water droplets in the air. This phenomenon is called the first rainbow. The second rainbow is a phenomenon in which the sun shines through the rain and is refracted by the water droplets in the air. This phenomenon is called the second rainbow. The second rainbow is a phenomenon in which the sun shines through the rain and is refracted by the water droplets in the air. This phenomenon is called the second rainbow. The second rainbow is a phenomenon in which the sun shines through the rain and is refracted by the water droplets in the air. This phenomenon is called the second rainbow. The second rainbow is a phenomenon in which the sun shines through the rain and is refracted by the water droplets in the air. This phenomenon is called the second rainbow. The second rainbow is a phenomenon in which the sun shines through the rain and is refracted by the water droplets in the air. This phenomenon is called the second rainbow. The second rainbow is a phenomenon in which the sun shines through the rain and is refracted by the water droplets in the air. This phenomenon is called
  1. couldn't finish the line.
    <|begin_of_text|><|begin_of_text|><|start_header_id|>system<|end_header_id|>
    You are a helpful, respectful and honest assistant AI.
    <|eot_id|><|start_header_id|>user<|end_header_id|>There's a double rainbow. Is this real?<|eot_id|>
    <|start_header_id|>assistant<|end_header_id|>
    Yes, it is.
    quot
    quot
    quot
    quot
    quot
    quot
    quot
    quot
    quot
    quot
    quot
    quot
    quot

Reproduce

#reproduce unsloth result.
from unsloth import FastLanguageModel

load_in_4bit = True
max_seq_length = 2048
model_name = "unsloth/Meta-Llama-3.1-8B"
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name,
    max_seq_length = max_seq_length,
    load_in_4bit = load_in_4bit,
    device_map="auto"  
)

FastLanguageModel.for_inference(model)

test_input = """<|begin_of_text|><|start_header_id|>system<|end_header_id|>
You are a helpful, respectful and honest assistant AI.
<|eot_id|><|start_header_id|>user<|end_header_id|>There's a double rainbow. Is this real?<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>
"""

# Tokenize the formatted prompt with padding
inputs = tokenizer(test_input, return_tensors="pt").to("cuda")

# Generate the output
outputs = model.generate(**inputs, max_new_tokens=256, use_cache=True)

# Decode the generated tokens
decoded_outputs = tokenizer.batch_decode(outputs)
print(decoded_outputs[0])
from transformers import AutoModelForCausalLM, BitsAndBytesConfig, AutoTokenizer
import torch
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

# Load the Llama 3 model with the quantization configuration
model_name = "meta-llama/Meta-Llama-3.1-8B-Instruct"
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=quantization_config,
    device_map="auto"
)

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)

test_input = """<|begin_of_text|><|start_header_id|>system<|end_header_id|>
You are a helpful, respectful and honest assistant AI.
<|eot_id|><|start_header_id|>user<|end_header_id|>There's a double rainbow. Is this real?<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>
"""
# Tokenize the input
inputs = tokenizer(test_input, return_tensors="pt").to("cuda")

# Generate the output
outputs = model.generate(**inputs, max_new_tokens=256, use_cache=True)

# Decode the generated tokens
decoded_outputs = tokenizer.batch_decode(outputs)

# Print the output
print(decoded_outputs[0])

Result comparison

#unsloth
<|begin_of_text|><|begin_of_text|><|start_header_id|>system<|end_header_id|>
You are a helpful, respectful and honest assistant AI.
<|eot_id|><|start_header_id|>user<|end_header_id|>There's a double rainbow. Is this real?<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>
You are a helpful, respectful and honest assistant AI.
precated
precated
precated
precated
precated
precated
precated
precated
precated
precated
precated
precated
#llama3.1-instruct

<|begin_of_text|><|begin_of_text|><|start_header_id|>system<|end_header_id|>
You are a helpful, respectful and honest assistant AI.
<|eot_id|><|start_header_id|>user<|end_header_id|>There's a double rainbow. Is this real?<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>
You're referring to the phenomenon of a double rainbow!

Yes, a double rainbow is a real atmospheric optical phenomenon. It's caused by the refraction and dispersion of sunlight as it passes through water droplets in the air.

When sunlight enters a water droplet, it is refracted, or bent, and split into its individual colors, a process known as dispersion. The different colors are then reflected back to the observer, creating the colors of the rainbow.

A double rainbow occurs when the light is refracted twice, creating two separate bows:

1. The primary rainbow: This is the brightest and most vivid part of the rainbow, with the colors appearing in the order of red, orange, yellow, green, blue, indigo, and violet.
2. The secondary rainbow: This is fainter and appears outside the primary rainbow, with the colors reversed, i.e., violet, indigo, blue, green, yellow, orange, and red.

The secondary rainbow is caused by light being reflected twice before reaching the observer, which is why it appears fainter and with reversed colors.

Double rainbows are relatively rare because the conditions required to produce them are quite specific: the sun must be behind the observer, the air must be filled with water droplets (like after
danielhanchen commented 3 months ago

Oh you're calling the base model, which is to be expected (it will output gibberish). The instruct model is finetuned on <|begin_of_text|><|begin_of_text|><|start_header_id|>system<|end_header_id|>... whilst the base model is not, so the base model will output gibberish - so hence why we have to finetune it!

danielhanchen commented 3 months ago

Also the base model does not handle the llama-3 chat template well, since the tokens are untrained - see https://unsloth.ai/blog/phi3 for more details (the blog was for Phi-3, but has Llama-3 bug fixes)

seolhokim commented 3 months ago

Thank you for your kind answer. I should have checked that differences. I thought those are trained on same dataset. thank you again.

danielhanchen commented 3 months ago

Nw!