Question about the mask usage in attentions

SummerpanKing commented 3 months ago

Hi, thanks for your code release.

I observed that features are cropped with their masks before being fed into attention modules, which differs from the way used in LoFTR. I also noticed that the comment "Not support generalized attention mask yet." is included. Could you please provide an explanation for this?

wyf2020 commented 3 months ago

Hi, great question and sorry for the late reply! This is for acceleration purposes when training on MegaDepth with the use of padding masks. We found that if the mask appears in the form of padding (making the shorter sides as long as the longer sides), instead of an arbitrary-shaped mask, we can take advantage of this property to crop and pad features before and after the attention process, respectively. This can be slightly faster than using a mask to do vanilla attention/flash attention. Since every image on MegaDepth is padded to have consistent dimensions in terms of width and height during training, this method can speed up the training process on MegaDepth. However, we also understand that when using matching models, some people might want to use specifically shaped masks to designate certain areas not to be matched, so we included the comment "Not support generalized attention mask yet." to indicate our plans to support arbitrary masks for input images in the future.

SummerpanKing commented 3 months ago

Thank you so much!

zju3dv / EfficientLoFTR

Question about the mask usage in attentions #6