uncbiag / SimpleClick

SimpleClick: Interactive Image Segmentation with Simple Vision Transformers (ICCV 2023)
MIT License
214 stars 33 forks source link

Point Sampling #30

Open Crazy-Tony opened 1 year ago

Crazy-Tony commented 1 year ago

I'm trying to sample points for finetuning the SimpleClick model using my own sampling algorithm. Therefore i tried to figure out the points format, that's required to use the model.forward(self, image, points) function. I added some vebose messages but the results where quite confusing to me.

In case of one positive point, I got the following output:

points.shape = torch.Size([2, 2, 3])
points = tensor([[[ 21.2914, 149.6880,   0.0000],
         [ -1.0000,  -1.0000,  -1.0000]],
        [[ 21.2914, 297.3120,   0.0000], 
         [ -1.0000,  -1.0000,  -1.0000]]]) 

Adding another positive point, I got the following output:

points.shape = torch.Size([2, 4, 3])
points = tensor([[[ 21.2914, 149.6880,   0.0000],
         [363.7146, 229.9360,   1.0000],
         [ -1.0000,  -1.0000,  -1.0000],
         [ -1.0000,  -1.0000,  -1.0000]],

        [[ 21.2914, 297.3120,   0.0000],
         [363.7146, 217.0640,   1.0000],
         [ -1.0000,  -1.0000,  -1.0000],
         [ -1.0000,  -1.0000,  -1.0000]]])

Providing one negative point:

points.shape = torch.Size([2, 4, 3])
points = tensor([[[ 21.2914, 149.6880,   0.0000],
         [363.7146, 229.9360,   1.0000],
         [298.6300, 218.0080,   2.0000],
         [ -1.0000,  -1.0000,  -1.0000]],

        [[ 21.2914, 297.3120,   0.0000],
         [363.7146, 217.0640,   1.0000],
         [298.6300, 228.9920,   2.0000],
         [ -1.0000,  -1.0000,  -1.0000]]])

The clicks that my sourcecode provides are in the format (batch_size, max_amount_clicks, 3), where the last dimension stores x_coord, y_coord and (1 for positive clicks and 0 for negative clicks). Could you please help me to transform my clicks to the required format? Thanks in advance! :)

qinliuliuqin commented 11 months ago

@Crazy-Tony Hi, did you figure out how to transform? You may need to read MultiPointSampler and get_next_points to resolve this issue. Feel free to let me know if you are still unclear about the points format.

Crazy-Tony commented 11 months ago

@qinliuliuqin Thank you for your answer. The problem for my confusion wasn't the transformation itself. I was able solve my problem :+1:

The part that confused me is the first dimension of of the above displayed points tensor. When using the demo.py I loaded one image and noticed later that the BasePredictor handles this image in the demo as 2 images. One is the original and the other one is a flipped version (based on this line). That's the reason why I was quite confused about the flipped points. Will the flipped image and the predicted masks of it be used for some reasons in the demo or is it just nonsense?

Another question from my side: Did evaluate the SimpleClick models on the TimberSeg 1.0 dataset?

Thanks in advance!