Closed lbbcalm closed 8 months ago
Hi,
Thank you for the interest in our work.
Given the binary mask of a class in an image and the SAM image embedding of the image, we compute the class embedding by taking the average of all foreground features, in three steps.
Step 1 - Since the original binary mask has a spatial resolution of 1024x1280 and the SAM image feature has a spatial resolution of 64x64, we first process the mask to 64x64, using the code here.
Step 2 - Sometimes the foreground region can be small, so it is possible that the mask after processing has no foreground at all. In this case, we do not compute the class embedding, as shown here.
Step 3 - If the mask after processing does contain some foreground regions, we multiply the processed mask with the SAM image embedding and take the mean, using the code here. The final class embedding is represented by a vector.
Hi, can you tell me how class_embedding is obtained? Is there a code for that?