openai / CLIP

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
MIT License
26.19k stars 3.35k forks source link

RuntimeError: The size of tensor a (3) must match the size of tensor b (9) at non-singleton dimension 0 #382

Open dEVANSH14122002 opened 1 year ago

dEVANSH14122002 commented 1 year ago

Take the dot product between "query" and "key" to get the raw attention scores.

--> 178 attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2)) 179 180 if self.position_embedding_type == "relative_key" or self.position_embedding_type == "relative_key_query":

RuntimeError: The size of tensor a (3) must match the size of tensor b (9) at non-singleton dimension 0