openai / CLIP

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
MIT License
26.19k stars 3.35k forks source link

Missing of a type conversion in model.py #386

Open GXZlegend opened 1 year ago

GXZlegend commented 1 year ago

There seems to miss a type conversion in the forward process of VisionTransformer, in clip/model.py. The direct forward pass without pre-conversion (Line 342) would cause error of type mismatch.

As a reference, in Line 146 there is an explicit type conversion in ModifiedResNet.

Is it possible to add conversion x = x.type(self.conv1.weight.dtype) between Line 223-224?