I noticed another critical bug (at least from mine point of view): after LoRa training and even with do_sample is False, consecutive inference results in different results:
Loading the base model:
from unsloth import FastLanguageModel
import torch
model_name = "unsloth/mistral-7b-instruct-v0.2-bnb-4bit"
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = model_name,
max_seq_length = 2048,
dtype = None,
load_in_4bit = True,
# token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)
tokenizer.padding_side='left' # for training right (inference left)
tokenizer.pad_token = tokenizer.eos_token
Hi there,
I noticed another critical bug (at least from mine point of view): after LoRa training and even with
do_sample
is False, consecutive inference results in different results:Loading the base model:
Setting up LoRa
Training:
I trained for 10K and than infer:
Even though
do_sample
is False responses are different (even if I reload the checkpoint)But if i save to model:
and than load it:
all outputs are consistently the same: