unslothai / unsloth

Finetune Llama 3.2, Mistral, Phi, Qwen 2.5 & Gemma LLMs 2-5x faster with 80% less memory
https://unsloth.ai
Apache License 2.0
18.72k stars 1.31k forks source link

Extra response with fine-tuned model #883

Open LiuAlex1109 opened 3 months ago

LiuAlex1109 commented 3 months ago

I trained model with unsloth, and when i input some questions to eval it, the model always output extra contents as shown at below:

image

my code:

from unsloth import FastLanguageModel
import torch

max_seq_length = 2048
dtype = None
load_in_4bit = True
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "/root/models/FlagAlpha/Llama3-Chinese-8B-Instruct",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)

alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

FastLanguageModel.for_inference(model)
inputs = tokenizer(
[
    alpaca_prompt.format(
        "入侵性蚂蚁如何通过个体大小的变化适应新环境?",
        "",
        "",
    )
], return_tensors = "pt").to("cuda")

# outputs = model.generate(**inputs, max_new_tokens = 64, use_cache = True)
# tokenizer.batch_decode(outputs)

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128)

Could someone tell me how to solve it, thanks.

ahmedembeddedxx commented 3 months ago

It generally doesn't happen, but it is most probably because of TextStreamer being called twice, you can use .split() wisely and extract the respective answer, for reference you can use this link. If the problem still bother, mind sharing the notebook link or the image of snippet.

danielhanchen commented 3 months ago

It's possible the EOS token isn't being generated - I suggest adding 5 EOS tokens or so to the finetuning dataset

Also since you're doing another language, I'm assuming you're doing continued pretraining on Llama-3? It's possible the EOS token is being suppressed - so ye try 5 EOS