Closed KomonoLi closed 11 months ago
Thank you for your attention 1) Since the paper is under review, we will release the code after the review stage. 2) The CLS token is obtained following CLIP. 3) The code is like following:
# Create example query tensor of shape [H, h, w, d/H]
query = torch.randn(H, h, w, d // H)
# Create example key tensor of shape [H, d/H]
key = torch.randn(H, d // H)
# Reshape the query tensor
query_reshaped = query.view(H, h * w, d // H) # Shape: [H, h * w, d/H]
# Reshape the key tensor
key_reshaped = key.view(H, d // H, 1) # Shape: [H, d/H, 1]
# Perform the batch matrix multiplication
result = torch.matmul(query_reshaped, key_reshaped) # Shape: [H, h * w, 1]
# Reshape the result
final_result = result.view(H, h, w) # Shape: [H, h, w]
Hi nice work! I was wondering if you are planning to release the codes?
BTW, can you provide more details about the attention map? You mentioned in your paper that attention map is calculated as the query image feature [H, h, w, d/H] and the key CLS token feature [H, d/H]. Is the key CLS token the last one of the key [H, h*w, d/H]?
Also, how to get alpha with shape [H,h,w] by multiplying query [H, h, w, d/H] and key [H, d/H]? The shape doesn't seem to match.