Open RonanFR opened 1 week ago
Hi @RonanFR, in general pipelines are inference-only, so loading the model with a pipeline and then training it is a bit odd! Can you see if you still get the issue with you initialize the model with AutoModelForSequenceClassification
and AutoTokenizer
instead? If you can give us some clean code without pipeline that reproduces the issue, we can investigate further.
Thanks for your reply @Rocketknight1 !
Indeed, I tested your suggestion and it works perfectly fine when training the last layer only (see message below).
@Rocketknight1 actually I went a bit fast before writing the last message. There is no issue when only the last layer score.weight
is set to trainable (i.e., for which requires_grad
is set to True
). But if we train other layers then the same RuntimeError: expected mat1 and mat2 to have the same dtype, but got: float != c10::BFloat16
seems to occur.
Minimum reproducible example:
from transformers import AutoModelForSequenceClassification, AutoTokenizer
from transformers import TrainingArguments
from transformers import Trainer
from datasets import load_dataset
# Load classification pipeline from pretrained model
model = AutoModelForSequenceClassification.from_pretrained(
"TinyLlama/TinyLlama_v1.1",
num_labels=5,
device_map="cuda"
)
tokenizer = AutoTokenizer.from_pretrained("TinyLlama/TinyLlama_v1.1")
tokenizer.add_special_tokens({"pad_token": tokenizer.eos_token})
model.resize_token_embeddings(len(tokenizer))
model.config.pad_token_id = tokenizer.pad_token_id
for n, p in model.named_parameters():
if ("score" not in n) and ("q_proj" not in n):
p.requires_grad = False
# Load + format dataset
dataset = load_dataset("yelp_review_full")["train"].select(range(100))
def tokenize_function(examples):
return tokenizer(
examples["text"],
max_length=20,
padding="max_length",
truncation=True
)
tokenized_datasets = dataset.map(tokenize_function, batched=True)
# Train
training_args = TrainingArguments(
per_device_train_batch_size=2**4,
num_train_epochs=1,
torch_compile=True,
bf16=True,
logging_strategy="steps",
logging_steps=1,
output_dir="/tmp/test1",
use_cpu=False
)
trainer = Trainer(
model=model,
train_dataset=tokenized_datasets,
eval_dataset=tokenized_datasets,
args=training_args,
tokenizer=tokenizer,
)
trainer.train()
I am selecting only the last score
layer and q_proj
layers in the above code, but the same problem occurs if selecting v_proj
layers for instance. Only when just the score
layer is trainable is the code working without errors.
I also tried with PEFT (instead of manually setting requires_grad
to True
on entire layers and False
on others), and the same problem occurs.
Yes, I can reproduce the issue, but only by going back to 4.45. It's unfortunately a little awkward on the latest version - there's another issue affecting Llama model training, so I can't fully reproduce the problem on main
: https://github.com/huggingface/transformers/pull/34442
System Info
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Expected behavior
The code attached raises
RuntimeError: expected mat1 and mat2 to have the same dtype, but got: float != c10::BFloat16
. When disabling torch compilation or using float32 (or doing both), everything works fine.The problem does not seem to occur when pytorch is downgraded to version 2.4.1. I am not fully sure though because in this case another error occur:
RuntimeError: invalid dtype for bias
when use compile + autocast · Issue #124901 · pytorch/pytorch · GitHub 1 (at the end of the issue they mention that the problem is fixed with pytorch 2.5.0, but then the issue above occurs, I am stuck in a circular loop :sweat_smile: )The same problem seems to occur with float16 instead of bfloat16 (but not for tensorfloat32 apparently).
The same code works perfectly well with
"facebook/bart-large"
instead of"Qwen/Qwen2.5-0.5B"
. But other models like"TinyLlama/TinyLlama_v1.1"
suffer from the same issue as"Qwen/Qwen2.5-0.5B"
.