CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
26.19k
stars
3.35k
forks
source link
RuntimeError: The size of tensor a (3) must match the size of tensor b (9) at non-singleton dimension 0 #382
Open
dEVANSH14122002 opened 1 year ago
Take the dot product between "query" and "key" to get the raw attention scores.
--> 178 attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2)) 179 180 if self.position_embedding_type == "relative_key" or self.position_embedding_type == "relative_key_query":
RuntimeError: The size of tensor a (3) must match the size of tensor b (9) at non-singleton dimension 0