wenxi-yue / SurgicalSAM

[AAAI2024] Official implementation of SurgicalSAM
MIT License
70 stars 9 forks source link

How class_embedding is obtained? #5

Closed lbbcalm closed 8 months ago

lbbcalm commented 8 months ago

Hi, can you tell me how class_embedding is obtained? Is there a code for that?

wenxi-yue commented 8 months ago

Hi,

Thank you for the interest in our work.

Given the binary mask of a class in an image and the SAM image embedding of the image, we compute the class embedding by taking the average of all foreground features, in three steps.

Step 1 - Since the original binary mask has a spatial resolution of 1024x1280 and the SAM image feature has a spatial resolution of 64x64, we first process the mask to 64x64, using the code here.

Step 2 - Sometimes the foreground region can be small, so it is possible that the mask after processing has no foreground at all. In this case, we do not compute the class embedding, as shown here.

Step 3 - If the mask after processing does contain some foreground regions, we multiply the processed mask with the SAM image embedding and take the mean, using the code here. The final class embedding is represented by a vector.