raoyongming / DenseCLIP

[CVPR 2022] DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting
505 stars 38 forks source link

A little question about dimensions #54

Open lxr-1204 opened 5 months ago

lxr-1204 commented 5 months ago

Thank you for such a great job. After reading the code, I have a little question about the comments and I hope you can help me! In the following code, the dimensions of visual_context are [B, N, C] and the dimensions of text_embeddings are [B, K, C]. Should the dimensions after context_decoder be [B, K, C]?

image

To facilitate your reading, I paste the context_decoder forward code below

image

Thanks again for your reply