xmu-xiaoma666 / X-Dreamer

A pytorch implementation of “X-Dreamer: Creating High-quality 3D Content by Bridging the Domain Gap Between Text-to-2D and Text-to-3D Generation”
Apache License 2.0
71 stars 3 forks source link

Attn map #1

Closed KomonoLi closed 11 months ago

KomonoLi commented 11 months ago

Hi nice work! I was wondering if you are planning to release the codes?

BTW, can you provide more details about the attention map? You mentioned in your paper that attention map is calculated as the query image feature [H, h, w, d/H] and the key CLS token feature [H, d/H]. Is the key CLS token the last one of the key [H, h*w, d/H]?

Also, how to get alpha with shape [H,h,w] by multiplying query [H, h, w, d/H] and key [H, d/H]? The shape doesn't seem to match.

xmu-xiaoma666 commented 11 months ago

Thank you for your attention 1) Since the paper is under review, we will release the code after the review stage. 2) The CLS token is obtained following CLIP. 3) The code is like following:

# Create example query tensor of shape [H, h, w, d/H]
query = torch.randn(H, h, w, d // H)

# Create example key tensor of shape [H, d/H]
key = torch.randn(H, d // H)

# Reshape the query tensor
query_reshaped = query.view(H, h * w, d // H)  # Shape: [H, h * w, d/H]

# Reshape the key tensor
key_reshaped = key.view(H, d // H, 1)  # Shape: [H, d/H, 1]

# Perform the batch matrix multiplication
result = torch.matmul(query_reshaped, key_reshaped)  # Shape: [H, h * w, 1]

# Reshape the result
final_result = result.view(H, h, w)  # Shape: [H, h, w]