Closed morrisalp closed 10 months ago
Hey!
I don't really think this is a bug, as the comment mentions: pooled_output = text_outputs[0] # last_hidden_state
maybe the name of the temporary variable pooled_outputs
is wrong as it is using the last hidden states.
The documentation for the pooled_output
states the following:
pooler_output (
torch.FloatTensor
of shape(batch_size, hidden_size)
): Last layer hidden-state of the first token of the sequence (classification token) after further processing through the layers used for the auxiliary pretraining task. E.g. for BERT-family of models, this returns the classification token after processing through a linear layer and a tanh activation function. The linear layer weights are trained from the next sentence prediction (classification) objective during pretraining.
when you want the text / image embeddings you want all the embedding not just the pooled one.
The documentation states e.g. for get_text_features
:
Returns:
text_features (`torch.FloatTensor` of shape `(batch_size, output_dim`):
(etc)
This is wrong since it returns a tensor of shape (batch_size, n_tokens, hidden_size)
when you want the text / image embeddings you want all the embedding not just the pooled one
Not for my specific use case... Additionally the CLIP API functions like get_text_features
also return pooled embeddings.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
System Info
transformers v4.33.0
Who can help?
@ArthurZucker @younesbelkada @amyeroberts
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
FLAVA models'
get_text_features
,get_image_features
, and related functions return unpooled embeddings of shape(batch_size, n_tokens, hidden_size)
rather than pooled(batch_size, hidden_size)
as expected and stated in documentation. Note the bug in the source code HERE:Actually,
text_outputs
has ordered keyslast_hidden_state
andpooler_output
in that order, the former being unpooled and the latter pooled.Expected behavior
Should returned pooled embeddings. Presumably this also affects other model methods using this such as the contrastive loss calculation.