Closed askerlee closed 1 year ago
Sorry I realized CLIPEncoder::forward()
is also overloaded. It returns hidden_states
only, instead of a tuple (hidden_states, encoder_states, all_attentions)
in huggingface transformers. Therefore, the equivalence remains.
I understand that
FrozenCLIPEmbedder::text_encoder_forward()
is to replaceCLIPTextTransformer::forward()
(more specifically,FrozenCLIPEmbedder.transformer.text_model.forward
), by introducing an extra argumentembedding_manager
, and keeping other code logic unchanged. However, it seems thetext_encoder_forward()
is implemented quite differently fromCLIPTextTransformer::forward()
.Specifically, in Line 281,
But in huggingface transformers, we see https://github.com/huggingface/transformers/blob/main/src/transformers/models/clip/modeling_clip.py#L730
They don't seem to be equivalent? Is this a potential bug? Thanks.