huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
135.05k stars 27.02k forks source link

validation of the eval dataset should be done in advance #33721

Closed jackyjinjing closed 1 month ago

jackyjinjing commented 1 month ago

System Info

I found that the validation of the eval dataset should be done in advance. It only prompted that the validation set was empty after I had trained for half an hour and completed one epoch.The code is as follows.I set eval_strategy="epoch" in train_args, but I don't set eval dataset in Trainer init.

`from transformers import TrainingArguments

train_args = TrainingArguments(output_dir="./checkpoints",
per_device_train_batch_size=2, per_device_eval_batch_size=1,
logging_steps=1,
eval_strategy="epoch",
save_strategy="epoch",
save_total_limit=3,
learning_rate=2e-5,
weight_decay=0.01, num_train_epochs=30, dataloader_drop_last=True, metric_for_best_model="f1",
load_best_model_at_end=True )

from transformers import DataCollatorWithPadding, Trainer

trainer = Trainer(model=model, args=train_args, train_dataset=process_dataset["train"], eval_dataset=None, data_collator=collate_fn)

`

transformers version:4.45.0 python version:3.10 paltform:ubuntu

@muellerzr @Sunm

Who can help?

No response

Information

Tasks

Reproduction

  1. set eval_strategy="epoch" in train_args
  2. set eval dataset=None in Trainer init
  3. run Trainer.run()

Expected behavior

validation of the eval dataset should be done in advance