RuntimeError: Expected all tensors to be on the same device

creatorrr commented 1 year ago

Code:

model = SetFitModel.from_pretrained(
      "sentence-transformers/paraphrase-mpnet-base-v2",
       use_differentiable_head=True,
       head_params={"out_features": num_classes},
    )

trainer = SetFitTrainer(
       model=model,
       train_dataset=ds["train"],
       eval_dataset=ds["test"],
       loss_class=CosineSimilarityLoss,
       metric="accuracy",
       batch_size=64,
       num_iterations=10, # The number of text pairs to generate for contrastive learning
       num_epochs=3, # The number of epochs to use for constrastive learning
       learning_rate=1.25e-05,
       column_mapping={"text": "text", "label": "label"} # Map dataset columns to text/label expected by trainer
 )

trainer.freeze()

trainer.unfreeze(keep_body_frozen=True)

trainer.train(
       num_epochs=25, # The number of epochs to train the head or the whole model (body and head)
       batch_size=64,
       body_learning_rate=1.25e-5, # The body's learning rate
       learning_rate=1e-3, # The head's learning rate
       l2_weight=0.0, # Weight decay on **both** the body and head. If `None`, will use 0.01.
   )

Error:

File ~/github.com/julep-ai/monorepo/lab/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py:1190, in Module._call_impl(self, *input, **kwargs)
   1186 # If we don't have any hooks, we want to skip the rest of the logic in
   1187 # this function, and just call forward.
   1188 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1189         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1190     return forward_call(*input, **kwargs)
   1191 # Do not call functions when jit is used
   1192 full_backward_hooks, non_full_backward_hooks = [], []

File ~/github.com/julep-ai/monorepo/lab/.venv/lib/python3.8/site-packages/torch/nn/modules/linear.py:114, in Linear.forward(self, input)
    113 def forward(self, input: Tensor) -> Tensor:
--> 114     return F.linear(input, self.weight, self.bias)

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat1 in method wrapper_addmm)

What fixed it:

model.model_body = model.model_body.to("cuda:0")
model.model_head = model.model_head.to("cuda:0")

creatorrr commented 1 year ago

cc/ @blakechi @lewtun

blakechi commented 1 year ago

Hi @creatorrr , thanks for raising this issue up. Could you provide the complete code you found this error? Specifically the ds you used in your snippet such that I can reproduce the issue, thanks!

Some supplemental information. So by default, model_head will follow model_body's device during initialization, so they should both on the GPU or CPU (sentence-transformers will put the model on the GPU if available by default). I tested the code snippet in the README.md for the differentiable head on colab with GPU runtime. And both of them are on the GPU.

creatorrr commented 1 year ago

Here's the full script code:

from datasets import load_dataset 
from sentence_transformers.losses import CosineSimilarityLoss
from setfit import SetFitModel, SetFitTrainer

ds = load_dataset("json", data_files={split: f"./setfit_{split}_data.json" for split in ("train", "test")})
num_classes = len(set(ds["train"]["label"]))

model = SetFitModel.from_pretrained(
    "sentence-transformers/paraphrase-mpnet-base-v2",
    use_differentiable_head=True,
    head_params={"out_features": num_classes},
)

trainer = SetFitTrainer(
    model=model,
    train_dataset=ds["train"],
    eval_dataset=ds["test"],
    loss_class=CosineSimilarityLoss,
    metric="accuracy",
    batch_size=64,
    num_iterations=10, # The number of text pairs to generate for contrastive learning
    num_epochs=3, # The number of epochs to use for constrastive learning
    learning_rate=1.25e-05,
    column_mapping={"text": "text", "label": "label"} # Map dataset columns to text/label expected by trainer
)

trainer.freeze()
trainer.unfreeze(keep_body_frozen=True)

# Note: Next step fails without this:
# model.model_head = model.model_head.to("cuda:0")
# model.model_body = model.model_body.to("cuda:0")

trainer.train(
    num_epochs=40, # The number of epochs to train the head or the whole model (body and head)
    batch_size=16,
    body_learning_rate=1e-5, # The body's learning rate
    learning_rate=1e-2, # The head's learning rate
    l2_weight=0.0, # Weight decay on **both** the body and head. If `None`, will use 0.01.
)

metrics = trainer.evaluate()
print(metrics)

lewtun commented 1 year ago

Thanks for sharing the full example @creatorrr - I was able to reproduce the error with the emotion dataset, so it's definitely a bug. It seems to be that calling trainer.freeze() followed by trainer.unfreeze() is the issue - perhaps some state is being preserved between the two calls

blakechi commented 1 year ago

Okay I think I know where the bug is.

So sentence-transformers by default initializes the model on CPU and it puts the model on GPU if available when we call fit. Ref 1 Ref 2.

So that's why it works smoothly when we first train the model body then train the head. If we train the head directly, since the body haven't been put to the GPU yet, so the error raised.

Hope the explanation makes sense! :)

I can fix it by putting the body to the _target_device right after its initialization. Will open a PR soon. Thank you @creatorrr for finding this bug!

tomaarsen commented 1 year ago

When running (a slightly modified variant of) your script, it now works as intended since blake's changes, so I'll close this!

huggingface / setfit

RuntimeError: Expected all tensors to be on the same device #169