med-air / 3DSAM-adapter

Holistic Adaptation of SAM from 2D to 3D for Promptable Medical Image Segmentation
134 stars 12 forks source link

img_pe and point_pe variable are unused in prompt_encoder.py #22

Closed SnaKey0u0 closed 10 months ago

SnaKey0u0 commented 10 months ago

Hi, thanks for the great work. I’ve been examining the prompt_encoder.py file and noticed that the img_pe and point_pe variables appear to be unused.

My understanding is that these should encode positional information using Fourier Feature Mapping, as they pass through the get_img_pe and _pe_encoding functions.

However, these information are not used in the self-attention and cross-attention of the TwoWayAttentionBlock.

Could you help clarify if I’ve overlooked something? I appreciate your assistance and look forward to your response.

image

peterant330 commented 10 months ago

nding is that these should encode positional information using Fourier Feature Mapping, as they pass through the get_img_pe and _pe_encoding functions.

However, these information are not used in the self-attention and cross-attention of the TwoWayAttentionBlock.

Could you help clarify if I’ve overlooked something? I appreciate your assistance and look forward to your response.

Hi, you are right, we did not use the positional information. The original SAM uses this Fourier Feature mapping to ensure the point embedding is similar to image embedding. However, we directly interpolate from the image embedding to obtain the point embedding. This can avoid over-smoothing caused by a larger amount of tokens and also make the prompt embedding focus more on semantic information. We have tested adding positional encoding in our framework but with no improvements.