Closed paulthemagno closed 2 years ago
Since it seems linked to T5 specifically, pinging @patrickvonplaten
cc @patil-suraj could you take a look here?
Looking into it!
Hi @paulthemagno !
This is because MT5EncoderModel
or the T5EncoderModel
is just a base model and does not have any head. So it does not accept the labels
argument.
To use this model for sequence classification, you could create a custom module and add a seq classification head on top of it, add the labels
argument in forward
and compute and return the loss. It should look very similar to how BertForSequenceClassification
is implemented.
Hope this helps :)
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Thnk you @patil-suraj, I followed your hints to create a T5ForSequenceClassification
class, taking inspiration from BertForSequenceClassification
.
I forked the last version of transformers
in the main and pushed my updates. You can find it here.
Since Bert uses a pooled output (T5 not) and has only an encoder in BertModel
(while T5Model
Encoder-Decoder), I'm not sure about my edits, can you please check them?
some updates:
The main update is that the difference of the object returned from the forward
function of BertModel
and T5Model
.
BertModel
returns an object of class BaseModelOutputWithPoolingAndCrossAttentions that contains the property pooler_output
T5Model
returns an object of class Seq2SeqModelOutput that doesn't have a pooler_output
variable.So in T5ForSequenceClassification
forward function I tried to pool the output of Seq2SeqModelOutput
(line) as done in BertModel
:
pooled_output = self.pooler(outputs[0]) if self.pooler is not None else None
pooled_output = self.dropout(pooled_output)
logits = self.classifier(pooled_output)
I didn't add it in T5Model
directly because I didn't want to "break" its output, but probably you know how to do that in a cleaner way. For the same reason I defined the pooler in T5ForSequenceClassification
(line) and not in T5Model
as done in BertModel
:
self.pooler = T5Pooler(config) #if add_pooling_layer else None
T5Pooler
is a copy of BertPooler
.
The other doubt I have is that T5Model
init needs both input/embeds_ids
and decoder_input/embeds_ids
, while BertModel
only the first ones. So in T5ForSequenceClassification
I pass also the decoder_input/embeds_ids
params, but with the same values of input/embeds_ids
(line). This probably could be a mistake.
outputs = self.t5(
input_ids,
attention_mask=attention_mask,
token_type_ids=token_type_ids,
position_ids=position_ids,
head_mask=head_mask,
inputs_embeds=inputs_embeds,
output_attentions=output_attentions,
output_hidden_states=output_hidden_states,
return_dict=return_dict,
# this is my main doubt: T5Model needs boht input_ids/embeds and decoder_input_ids/emebds while BertModel only input_ids/embeds
# I tried to pass to T5Model the same values for these variables but I'm not sure about that
decoder_input_ids=input_ids,
decoder_inputs_embeds=inputs_embeds,
)
I tried to do a training with a GPU instanceml.g4dn.xlarge
using a t5-small
on that but it seems very slow. Obviously, I replaced the definition of the model in the issue with:
model = T5ForSequenceClassification.from_pretrained(model_name, num_labels=num_labels)
Anyway, the training isn't raising errors.
Thank you!!
System Info
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
I have a problem in the training of a
google/mt5-small
Expected behavior