seolhokim commented 3 months ago

Thank you for your work. However, I've noticed some performance issues that differ significantly when compared to the Llama 3.1 model. Specifically, I've observed the following problems:

Issue Description

Special Token Handling: The Unsloth model seems to have difficulty managing special tokens, leading to incoherent and less relevant outputs.
Incomplete and Nonsensical Responses: The model frequently produces incomplete responses or outputs that are nonsensical and meaningless. This issue affects the clarity and flow of the generated content.

Issue example

1.It is saying same thing again and again.

### Instruction:
You are a helpful, respectful and honest assistant AI.

### Input:
There's a double rainbow. Is this real?

### Output:
Yes, this is real. This is the second rainbow.

### Explanation:
The first rainbow is a phenomenon in which the sun shines through the rain and is refracted by the water droplets in the air. This phenomenon is called the first rainbow. The second rainbow is a phenomenon in which the sun shines through the rain and is refracted by the water droplets in the air. This phenomenon is called the second rainbow. The second rainbow is a phenomenon in which the sun shines through the rain and is refracted by the water droplets in the air. This phenomenon is called the second rainbow. The second rainbow is a phenomenon in which the sun shines through the rain and is refracted by the water droplets in the air. This phenomenon is called the second rainbow. The second rainbow is a phenomenon in which the sun shines through the rain and is refracted by the water droplets in the air. This phenomenon is called the second rainbow. The second rainbow is a phenomenon in which the sun shines through the rain and is refracted by the water droplets in the air. This phenomenon is called the second rainbow. The second rainbow is a phenomenon in which the sun shines through the rain and is refracted by the water droplets in the air. This phenomenon is called

couldn't finish the line.

<|begin_of_text|><|begin_of_text|><|start_header_id|>system<|end_header_id|>
You are a helpful, respectful and honest assistant AI.
<|eot_id|><|start_header_id|>user<|end_header_id|>There's a double rainbow. Is this real?<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>
Yes, it is.
quot
quot
quot
quot
quot
quot
quot
quot
quot
quot
quot
quot
quot

Reproduce

#reproduce unsloth result.
from unsloth import FastLanguageModel

load_in_4bit = True
max_seq_length = 2048
model_name = "unsloth/Meta-Llama-3.1-8B"
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name,
    max_seq_length = max_seq_length,
    load_in_4bit = load_in_4bit,
    device_map="auto"  
)

FastLanguageModel.for_inference(model)

test_input = """<|begin_of_text|><|start_header_id|>system<|end_header_id|>
You are a helpful, respectful and honest assistant AI.
<|eot_id|><|start_header_id|>user<|end_header_id|>There's a double rainbow. Is this real?<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>
"""

# Tokenize the formatted prompt with padding
inputs = tokenizer(test_input, return_tensors="pt").to("cuda")

# Generate the output
outputs = model.generate(**inputs, max_new_tokens=256, use_cache=True)

# Decode the generated tokens
decoded_outputs = tokenizer.batch_decode(outputs)
print(decoded_outputs[0])

from transformers import AutoModelForCausalLM, BitsAndBytesConfig, AutoTokenizer
import torch
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

# Load the Llama 3 model with the quantization configuration
model_name = "meta-llama/Meta-Llama-3.1-8B-Instruct"
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=quantization_config,
    device_map="auto"
)

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)

test_input = """<|begin_of_text|><|start_header_id|>system<|end_header_id|>
You are a helpful, respectful and honest assistant AI.
<|eot_id|><|start_header_id|>user<|end_header_id|>There's a double rainbow. Is this real?<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>
"""
# Tokenize the input
inputs = tokenizer(test_input, return_tensors="pt").to("cuda")

# Generate the output
outputs = model.generate(**inputs, max_new_tokens=256, use_cache=True)

# Decode the generated tokens
decoded_outputs = tokenizer.batch_decode(outputs)

# Print the output
print(decoded_outputs[0])

Result comparison

#unsloth
<|begin_of_text|><|begin_of_text|><|start_header_id|>system<|end_header_id|>
You are a helpful, respectful and honest assistant AI.
<|eot_id|><|start_header_id|>user<|end_header_id|>There's a double rainbow. Is this real?<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>
You are a helpful, respectful and honest assistant AI.
precated
precated
precated
precated
precated
precated
precated
precated
precated
precated
precated
precated

#llama3.1-instruct

<|begin_of_text|><|begin_of_text|><|start_header_id|>system<|end_header_id|>
You are a helpful, respectful and honest assistant AI.
<|eot_id|><|start_header_id|>user<|end_header_id|>There's a double rainbow. Is this real?<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>
You're referring to the phenomenon of a double rainbow!

Yes, a double rainbow is a real atmospheric optical phenomenon. It's caused by the refraction and dispersion of sunlight as it passes through water droplets in the air.

When sunlight enters a water droplet, it is refracted, or bent, and split into its individual colors, a process known as dispersion. The different colors are then reflected back to the observer, creating the colors of the rainbow.

A double rainbow occurs when the light is refracted twice, creating two separate bows:

1. The primary rainbow: This is the brightest and most vivid part of the rainbow, with the colors appearing in the order of red, orange, yellow, green, blue, indigo, and violet.
2. The secondary rainbow: This is fainter and appears outside the primary rainbow, with the colors reversed, i.e., violet, indigo, blue, green, yellow, orange, and red.

The secondary rainbow is caused by light being reflected twice before reaching the observer, which is why it appears fainter and with reversed colors.

Double rainbows are relatively rare because the conditions required to produce them are quite specific: the sun must be behind the observer, the air must be filled with water droplets (like after

danielhanchen commented 3 months ago

danielhanchen commented 3 months ago

Also the base model does not handle the llama-3 chat template well, since the tokens are untrained - see https://unsloth.ai/blog/phi3 for more details (the blog was for Phi-3, but has Llama-3 bug fixes)

seolhokim commented 3 months ago

Thank you for your kind answer. I should have checked that differences. I thought those are trained on same dataset. thank you again.

danielhanchen commented 3 months ago

Nw!

unslothai / unsloth

Issue Report: Inconsistent Behavior and Meaningless Output #877

Issue Description

Issue example

Reproduce

Result comparison