unslothai / unsloth

Finetune Llama 3.2, Mistral, Phi, Qwen 2.5 & Gemma LLMs 2-5x faster with 80% less memory
https://unsloth.ai
Apache License 2.0
18.68k stars 1.31k forks source link

[Urgent] After reinstalling unsloth, Llama 3.2/3.1 fine tuning gets error with customized compute_metrics function #1327

Open yuan-xia opened 6 days ago

yuan-xia commented 6 days ago

Hi, there might be a bug in unsloth I found. For better clarification, I shared the code of the unsloth's llama 3.1 training notebook just with a small change . anyone can help me check why the trainer is not working? I just add a compute metrics to test. The "pred" in compute metrics surprisingly gets nothing?! (it worked before.)

https://drive.google.com/file/d/1UPMxPUifjLKgYOpIfLDvER1LHC4hop63/view?usp=sharing

`def compute_metrics(pred): predictions, labels = pred print(predictions) print(labels) labels = pred.label_ids preds = pred.predictions#.argmax(-1) print("predictions: ", str(preds))

trainer = SFTTrainer( model = model, tokenizer = tokenizer, train_dataset = dataset, dataset_text_field = "text", eval_dataset= dataset.take(100), compute_metrics=compute_metrics, max_seq_length = max_seq_length, dataset_num_proc = 2, packing = False, # Can make training 5x faster for short sequences. args = TrainingArguments( per_device_train_batch_size = 2, gradient_accumulation_steps = 4, warmup_steps = 5,

num_train_epochs = 1, # Set this for 1 full training run.

    max_steps = 60,
    per_device_eval_batch_size=2,
    eval_accumulation_steps = 1,
    eval_steps = 1,
    eval_strategy="steps",
    save_strategy = "steps",
    learning_rate = 2e-4,
    fp16 = not is_bfloat16_supported(),
    bf16 = is_bfloat16_supported(),
    logging_steps = 1,
    optim = "adamw_8bit",
    weight_decay = 0.01,
    lr_scheduler_type = "linear",
    seed = 3407,
    output_dir = "outputs",
    report_to = "none", # Use this for WandB etc
),

)

trainer_stats = trainer.train()`

error: () [[128000 39314 374 ... -100 -100 -100] [128000 39314 374 ... -100 -100 -100] [128000 39314 374 ... -100 -100 -100] ... [128000 39314 374 ... -100 -100 -100] [128000 39314 374 ... -100 -100 -100] [128000 39314 374 ... -100 -100 -100]] predictions: ()

TypeError Traceback (most recent call last)

in () ----> 1 trainer_stats = trainer.train() 5 frames /usr/local/lib/python3.10/dist-packages/transformers/trainer.py in evaluation_loop(self, dataloader, description, prediction_loss_only, ignore_keys, metric_key_prefix) 4274 metrics[f"{metric_key_prefix}_loss"] = np.concatenate(all_losses).mean().item() 4275 elif isinstance(all_losses, np.ndarray): -> 4276 metrics[f"{metric_key_prefix}_loss"] = all_losses.mean().item() 4277 if hasattr(self, "jit_compilation_time"): 4278 metrics[f"{metric_key_prefix}_jit_compilation_time"] = self.jit_compilation_time TypeError: 'NoneType' object does not support item assignment
Cirr0e commented 5 days ago

Hi there! I've analyzed the issue with your compute_metrics function, and I can help you resolve the error you're encountering.

The main problem is in how the compute_metrics function is accessing the predictions. Let me show you the correct way to implement this:

def compute_metrics(eval_pred):
    # eval_pred is an EvalPrediction object
    # Correctly unpack predictions and labels
    predictions = eval_pred.predictions
    labels = eval_pred.label_ids

    # If predictions are logits (which is usually the case)
    if len(predictions.shape) > 2:
        predictions = np.argmax(predictions, axis=-1)

    print("Predictions shape:", predictions.shape)
    print("Labels shape:", labels.shape)

    # Add your metric calculations here
    return {"accuracy": (predictions == labels).astype(np.float32).mean().item()}

The key changes needed are:

  1. Change the function parameter from pred to eval_pred - this is what the trainer actually passes
  2. Use the correct attributes: eval_pred.predictions and eval_pred.label_ids
  3. Remove the duplicate assignments that were causing issues

The error you're seeing occurs because:

Also, ensure you have these settings in your TrainingArguments:

args = TrainingArguments(
    # ... your other arguments ...
    include_inputs_for_metrics=True,
    predict_with_generate=True if using_generation else False,
)

Let me know if you need any clarification or run into other issues while implementing this solution. I'm here to help!

Risks to be aware of:

  1. Memory usage might increase when processing large batches
  2. Make sure your metrics handling can deal with padded sequences if you're using variable length inputs

Remember to test with a small subset of your data first to verify the metrics are being calculated correctly.

ineffablekenobi commented 5 days ago

Hey, I'm facing the same error. I've defined compute_metrices like

def compute_metrics(pred):
    labels_ids = pred.label_ids
    pred_ids = pred.predictions[0]

preds are empty. I've observed similar behavior for preprocess_logits_for_metrics

def preprocess_logits_for_metrics(logits, labels):
    print(logits)
    pred_ids = np.argmax(logits, axis=-1)
    return pred_ids, labels

logits is passed as an empty tuple.

yuan-xia commented 4 days ago

Hi there! I've analyzed the issue with your compute_metrics function, and I can help you resolve the error you're encountering.

The main problem is in how the compute_metrics function is accessing the predictions. Let me show you the correct way to implement this:

Hi, thanks for your reply, but your suggestions are not working in the training. I have changed pred to eval_pred. Besides, the two arguments are not accepted in the current version of Trainer. FYI, I'm training Llama 3.1 8B in SFTrainer. You could refer to the Colab link I shared for more details, which is just an unsloth public notebook.

I have defined it as follows: def compute_metrics(eval_pred): predictions = eval_pred.predictions labels = eval_pred.label_ids print("predictions: ", str(predictions))

predictions are an empty tuple

yuan-xia commented 4 days ago

Hey, I'm facing the same error. I've defined compute_metrices like

def compute_metrics(pred):
    labels_ids = pred.label_ids
    pred_ids = pred.predictions[0]

preds are empty. I've observed similar behavior for preprocess_logits_for_metrics

def preprocess_logits_for_metrics(logits, labels):
    print(logits)
    pred_ids = np.argmax(logits, axis=-1)
    return pred_ids, labels

logits is passed as an empty tuple.

Hi there, I have the same issue as yours! I also tested preprocess_logits_for_metrics and it also gets an empty tuple. And the above change suggestions still do not work in my case. This never happens before! Have you solved this issue?

danielhanchen commented 4 days ago

Apologies on the horrid delay - I recently added Apple's reduced memory cross entropy, so it stopped calculating the logits (hence the issue) - I'm planning to move this into a flag, so hopefully it'll return the behavior.

yuan-xia commented 3 days ago

Apologies on the horrid delay - I recently added Apple's reduced memory cross entropy, so it stopped calculating the logits (hence the issue) - I'm planning to move this into a flag, so hopefully it'll return the behavior.

Thanks Daniel for sharing this detail. Yes, our fine-tuning is kind of stuck due to this bug in the evaluation step. I'm happy to know it will be fixed soon. Much appreciated!

danielhanchen commented 3 days ago

@yuan-xia @ineffablekenobi Ok added an optional flag now! During FastLanguageModel.from_pretrained(...) add an extra flag called return_logits = True ie

model, tokenizer = FastVisionModel.from_pretrained(
    "unsloth/Llama-3.2-11B-Vision-Instruct",
    load_in_4bit = True, # Use 4bit to reduce memory use. False for 16bit LoRA.
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for long context
    return_logits = True, # <<<<
)

Another option is to add an environment variable before invoking trainer.train() ie

import os
os.environ['UNSLOTH_RETURN_LOGITS'] = '1'
... trainer.train() ...

Also please update Unsloth via: (or rerun Colab / Kaggle)

pip uninstall unsloth unsloth-zoo -y
pip install --upgrade --no-cache-dir --no-deps unsloth unsloth-zoo
ineffablekenobi commented 2 days ago

@danielhanchen I think we need to add this in this flag to the FastLanguageModel class too. Please have a look.

from_pretrained(cls, pretrained_model_name_or_path, config, cache_dir, ignore_mismatched_sizes, force_download, local_files_only, token, revision, use_safetensors, weights_only, *model_args, **kwargs)
   4095         with ContextManagers(init_contexts):
   4096             # Let's make sure we don't run the init function of buffer modules
-> 4097             model = cls(config, *model_args, **model_kwargs)
   4098 
   4099         # make sure we use the model's config since the __init__ call might have copied it

TypeError: LlamaForCausalLM.__init__() got an unexpected keyword argument 'return_logits'
yuan-xia commented 1 day ago

@danielhanchen Thanks for your in-time reply. First option does not work since "TypeError: LlamaForCausalLM.init() got an unexpected keyword argument 'return_logits'". But the second option works on my end! My evaluation step resumes now.

import os os.environ['UNSLOTH_RETURN_LOGITS'] = '1' ... trainer.train() ...