img_pe and point_pe variable are unused in prompt_encoder.py

med-air / 3DSAM-adapter

Holistic Adaptation of SAM from 2D to 3D for Promptable Medical Image Segmentation

134 stars 12 forks source link

nding is that these should encode positional information using Fourier Feature Mapping, as they pass through the get_img_pe and _pe_encoding functions.

However, these information are not used in the self-attention and cross-attention of the TwoWayAttentionBlock.

Could you help clarify if I’ve overlooked something? I appreciate your assistance and look forward to your response.

Hi, you are right, we did not use the positional information. The original SAM uses this Fourier Feature mapping to ensure the point embedding is similar to image embedding. However, we directly interpolate from the image embedding to obtain the point embedding. This can avoid over-smoothing caused by a larger amount of tokens and also make the prompt embedding focus more on semantic information. We have tested adding positional encoding in our framework but with no improvements.

med-air / 3DSAM-adapter

img_pe and point_pe variable are unused in prompt_encoder.py #22