Closed creatorrr closed 1 year ago
cc/ @blakechi @lewtun
Hi @creatorrr , thanks for raising this issue up. Could you provide the complete code you found this error? Specifically the ds
you used in your snippet such that I can reproduce the issue, thanks!
Some supplemental information. So by default, model_head
will follow model_body
's device during initialization, so they should both on the GPU or CPU (sentence-transformers
will put the model on the GPU if available by default). I tested the code snippet in the README.md
for the differentiable head on colab with GPU runtime. And both of them are on the GPU.
Here's the full script code:
from datasets import load_dataset
from sentence_transformers.losses import CosineSimilarityLoss
from setfit import SetFitModel, SetFitTrainer
ds = load_dataset("json", data_files={split: f"./setfit_{split}_data.json" for split in ("train", "test")})
num_classes = len(set(ds["train"]["label"]))
model = SetFitModel.from_pretrained(
"sentence-transformers/paraphrase-mpnet-base-v2",
use_differentiable_head=True,
head_params={"out_features": num_classes},
)
trainer = SetFitTrainer(
model=model,
train_dataset=ds["train"],
eval_dataset=ds["test"],
loss_class=CosineSimilarityLoss,
metric="accuracy",
batch_size=64,
num_iterations=10, # The number of text pairs to generate for contrastive learning
num_epochs=3, # The number of epochs to use for constrastive learning
learning_rate=1.25e-05,
column_mapping={"text": "text", "label": "label"} # Map dataset columns to text/label expected by trainer
)
trainer.freeze()
trainer.unfreeze(keep_body_frozen=True)
# Note: Next step fails without this:
# model.model_head = model.model_head.to("cuda:0")
# model.model_body = model.model_body.to("cuda:0")
trainer.train(
num_epochs=40, # The number of epochs to train the head or the whole model (body and head)
batch_size=16,
body_learning_rate=1e-5, # The body's learning rate
learning_rate=1e-2, # The head's learning rate
l2_weight=0.0, # Weight decay on **both** the body and head. If `None`, will use 0.01.
)
metrics = trainer.evaluate()
print(metrics)
Thanks for sharing the full example @creatorrr - I was able to reproduce the error with the emotion
dataset, so it's definitely a bug. It seems to be that calling trainer.freeze()
followed by trainer.unfreeze()
is the issue - perhaps some state is being preserved between the two calls
Okay I think I know where the bug is.
So sentence-transformers
by default initializes the model on CPU and it puts the model on GPU if available when we call fit
. Ref 1 Ref 2.
So that's why it works smoothly when we first train the model body then train the head. If we train the head directly, since the body haven't been put to the GPU yet, so the error raised.
Hope the explanation makes sense! :)
I can fix it by putting the body to the _target_device
right after its initialization. Will open a PR soon. Thank you @creatorrr for finding this bug!
When running (a slightly modified variant of) your script, it now works as intended since blake's changes, so I'll close this!
Code:
Error:
What fixed it: