huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
132.7k stars 26.44k forks source link

Trainer doesn't save evaluation metrics. #33733

Open filbeofITK opened 2 days ago

filbeofITK commented 2 days ago

System Info

Who can help?

@muellerzr @SunMarc

Information

Tasks

Reproduction

I'm trying to log the evaluation metrics of my model to tensorboard so that I can monitor training. My compute metrics looks like this:

# Metrics for the output are only loaded once
acc = evaluate.load("accuracy")
metrics = evaluate.combine(['precision', 'recall', 'f1'])

# Function to calculate metrics for evaluation
def compute_metrics(eval_pred):
    # Convert logits to predictions
    predictions = argmax(eval_pred.predictions, axis=-1)
    results =  metrics.compute(predictions=predictions, references=eval_pred.label_ids, average='micro')
    results['accuracy'] = acc.compute(predictions=predictions, references=eval_pred.label_ids)['accuracy']
    return results

These are my training arguments:

training_args = TrainingArguments(
    torch_compile=True,
    torch_compile_mode="default",
    fp16=True,
    output_dir=os.path.abspath('./checkpoints'),  # Output directory for checkpoints
    num_train_epochs=EPOCHS,  # Total number of training epochs
    per_device_train_batch_size=BATCH_SIZE,  # Batch size per device during training
    per_device_eval_batch_size=BATCH_SIZE,  # Batch size for evaluation
    logging_dir='./logs',
    report_to='tensorboard',
    logging_strategy='steps',
    log_level='debug',
    logging_steps=100,
    gradient_accumulation_steps=1,
    do_eval=True,  # Force evaluation, otherwise it might not work
    eval_strategy='steps',  # Evaluate at regular step intervals
    eval_steps=500,  # Evaluate every 500 steps
    save_strategy='steps',
    save_steps=500,
    save_total_limit=10
)

In the tensorboard logs I cannot find anything related to eval metric cards if I pass "max-autotune" to compile mode. With "reduce-overhead" and no compilation the eval cards corresponding to speed, and number of eval samples... are there but the metrics themselves are always missing.

A few things to note: The trainer does log training metrics such as logs correctly so it can see the tensorboard instance. The metrics do get calculated but then are discarded.

Expected behavior

I would like to get the logs in tensorboard.

LysandreJik commented 1 day ago

I added it to the Trainer issue tracker; if you have an idea about how to fix the problem, please feel free to go ahead and offer a PR! Thanks :hugs: