There seems to miss a type conversion in the forward process of VisionTransformer, in clip/model.py. The direct forward pass without pre-conversion (Line 342) would cause error of type mismatch.
As a reference, in Line 146 there is an explicit type conversion in ModifiedResNet.
Is it possible to add conversion x = x.type(self.conv1.weight.dtype) between Line 223-224?
There seems to miss a type conversion in the forward process of
VisionTransformer
, inclip/model.py
. The direct forward pass without pre-conversion (Line 342) would cause error of type mismatch.As a reference, in Line 146 there is an explicit type conversion in
ModifiedResNet
.Is it possible to add conversion
x = x.type(self.conv1.weight.dtype)
between Line 223-224?