Closed abedini-arteriaai closed 10 months ago
Hello!
This implies we no longer need
freeze
andunfreeze
and that can be replaced with atrainer.train()
call as long asbatch_size
andnum_epochs
are tuples.
Indeed. The reason that SetFitModel.freeze
and SetFitModel.unfreeze
still exist is because these are now called automatically in the trainer, instead of relying on the user to perform the (un)freezing themselves.
For example:
# Load a SetFit model from Hub
model: SetFitModel = SetFitModel.from_pretrained(
"sentence-transformers/paraphrase-mpnet-base-v2",
use_differentiable_head=True,
head_params={"out_features": 2},
)
# Create trainer
trainer = SetFitTrainer(
model=model,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
loss_class=CosineSimilarityLoss,
metric="accuracy",
learning_rate=2e-5,
batch_size=16,
num_iterations=20,
num_epochs=1,
)
trainer.freeze() # Freeze the head
trainer.train() # Train only the body
# Unfreeze the head and unfreeze the body -> end-to-end training
trainer.unfreeze(keep_body_frozen=False)
trainer.train(
num_epochs=16,
batch_size=2,
body_learning_rate=1e-5,
learning_rate=1e-2,
)
metrics = trainer.evaluate()
# Load a SetFit model from Hub
model: SetFitModel = SetFitModel.from_pretrained(
"sentence-transformers/paraphrase-mpnet-base-v2",
use_differentiable_head=True,
head_params={"out_features": 2},
)
# Create Training Arguments
args = TrainingArguments(
# When an argument is a tuple, the first value is for training the embeddings,
# and the latter is for training the differentiable classification head:
batch_size=(16, 2),
num_iterations=20,
num_epochs=(1, 16),
body_learning_rate=(2e-5, 1e-5),
head_learning_rate=1e-2,
end_to_end=True,
loss=CosineSimilarityLoss,
)
# Create Trainer
trainer = Trainer(
model=model,
args=args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
metric="accuracy",
)
# Train and evaluate
trainer.train()
metrics = trainer.evaluate()
(I didn't run these snippets, so they might have a small mistake somewhere, but they should be roughly correct)
Hope this helps!
This is great, thank you I'll try it.
Is body_learning_rate
a tuple value? there's already a head_learning_rate
parameter, I assumed that's why learning rate was split into two.
body_learning_rate
is indeed a tuple value. This is a bit unfortunate I agree, but there are essentially three learning rates to consider:
body_learning_rate
first value: The LR for the Sentence Transformer body for the embedding phase of SetFit (note that SetFit consists of two phases: finetuning the embeddings & training a classifier)body_learning_rate
second value: The LR for the Sentence Transformer body for the classifier phase of SetFit. Note: This is only used if the SetFit is used with a differentiable PyTorch head (by default it uses a non-differentiable sklearn Logistic Regression head) and if end_to_end=True
. If it's False
, then the body is frozen during the classifier training, after all.head_learning_rate
: The LR for the differentiable head, only used if the SetFit is used with a differentiable PyTorch head.You can give body_learning_rate
just one float value and then you'll use that LR for both the embedding and classifier phase.
I hope this makes some sense! This is perhaps the most complex part of all of the training arguments, so it's all easier from here, haha.
The migration guide says the following,
This implies we no longer need
freeze
andunfreeze
and that can be replaced with atrainer.train()
call as long asbatch_size
andnum_epochs
are tuples. However, the guide also saysIt seems like we still use
unfreeze
, so what does the change look like? If I can get a sample code it would be very helpful.Thank you.