How class_embedding is obtained？

Hi,

Thank you for the interest in our work.

Given the binary mask of a class in an image and the SAM image embedding of the image, we compute the class embedding by taking the average of all foreground features, in three steps.

Step 1 - Since the original binary mask has a spatial resolution of 1024x1280 and the SAM image feature has a spatial resolution of 64x64, we first process the mask to 64x64, using the code here.

Step 2 - Sometimes the foreground region can be small, so it is possible that the mask after processing has no foreground at all. In this case, we do not compute the class embedding, as shown here.

Step 3 - If the mask after processing does contain some foreground regions, we multiply the processed mask with the SAM image embedding and take the mean, using the code here. The final class embedding is represented by a vector.

wenxi-yue / SurgicalSAM

How class_embedding is obtained？ #5