Loading pretrained CLIP parameters but tuncate context_length in positional_embedding?

raoyongming / DenseCLIP

[CVPR 2022] DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting

520 stars 40 forks source link

Loading pretrained CLIP parameters but tuncate context_length in positional_embedding? #39

Closed laisimiao closed 1 year ago

laisimiao commented 1 year ago

I find that when you load pretrained CLIP parameters but tuncate context_length in positional_embedding like: https://github.com/raoyongming/DenseCLIP/blob/3b72447dee3f622f3716738140161ef9f763c72f/detection/denseclip/models.py#L652-L655

Does this affect pretrained model performance or in other words, does this change the pretrained model text encoder original output ?

raoyongming commented 1 year ago

We truncate the text input and corresponding positional_embedding to reduce memory/computations since the texts used in our cases are usually short (prompt + class name). The modification will not affect performance on downstream tasks like segmentation or detection, but the output might be slightly changed compared to the original implementation in CLIP.

laisimiao commented 1 year ago

OK, thanks.