icoz69 / CaNet

The code for paper "CANet: Class-Agnostic Segmentation Networks with Iterative Refinement and Attentive Few-Shot Learning"
191 stars 33 forks source link

model does not upsample predictions #6

Closed SMHendryx closed 5 years ago

SMHendryx commented 5 years ago

In the paper you note that: "Our final result is achieved by bilinearly upsampling the confidence map to the same spatial size of the query image and classifying each location according to the confidence maps." As far as I can tell from the code, your model is not upsampling (the predictions come from here: https://github.com/icoz69/CaNet/blob/fb75e10fb66a6b0e5b72842b25065eec27020037/one_shot_network.py#L308 )

How are you evaluating the results when the predictions and labels are of different shapes? Do you downsize the masks to be the size of the predictions or upsample the predictions to be the size of the masks in other unreleased code?

Also, during training and eval, do you first resize the images and masks before passing them into the graph? Or do you use original pascal image dimensions?

This could relate to difficulties in producing reported results https://github.com/icoz69/CaNet/issues/4

icoz69 commented 5 years ago

hello, i have updated the training scripts now. the no-learnable upsampling is done outside the forward function in the network. for the final evaluation, all metrics, i.e. meanIOU and FB-IOU are based on the original image size.

SMHendryx commented 5 years ago

Hi @icoz69, thank you for getting back to me. It's cool to see some more of the training details. It looks like you are training and evaluating on input_size = (321, 321), not the original image shapes, which are not all 321 by 321, correct? Also, It looks like you are saving the best model found during training (on the 15 training tasks) by evaluating on the test set of 5 held out tasks here: https://github.com/icoz69/CaNet/blob/fdce9462d03ff93bc11f4795df94fbbf60b1c8ca/train.py#L269 Is that correct?

icoz69 commented 5 years ago

Hi  you are right . This is a quick batched  validition with 321. To get the final result, you should validare with raw size and multi-scale input test. For the cross -validation experment on pascal voc, we record the best val peformance. 

---Original--- From: "Sean Hendryx"<notifications@github.com> Date: Mon, Sep 9, 2019 07:22 AM To: "icoz69/CaNet"<CaNet@noreply.github.com>; Cc: "Mention"<mention@noreply.github.com>;"icoz69"<691269335@qq.com>; Subject: Re: [icoz69/CaNet] model does not upsample predictions (#6)

Hi @icoz69, thank you for getting back to me. It's cool to see some more of the training details. It looks like you are training and evaluating on input_size = (321, 321), not the original image shapes, which are not all 321 by 321, correct? Also, It looks like you are saving the best model found during training (on the 15 training tasks) by evaluating on the test set of 5 held out tasks here: https://github.com/icoz69/CaNet/blob/fdce9462d03ff93bc11f4795df94fbbf60b1c8ca/train.py#L269 Is that correct?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

SMHendryx commented 5 years ago

Thanks for the additional details @icoz69. One more question I had about the dataset that would be helpful for people to know is: how did you combine the examples from SBD and PASCAL to make the PASCAL-5^i dataset? Given that there is some amount of overlap, did you just use SBD? Or did you simply put the images and masks from both PASCAL and SBD into the same data directory?

I ask because if you simply put both datasets into the same directory, there may be some amount of overlap, meaning you will sometimes have the same images in the sampled few-shot training and validation sets.

SMHendryx commented 5 years ago

Or perhaps you choose the image-mask pair if it is in one parent dataset and not the other?

I.e. if the mask for image1.jpg is in both SBD and PASCAL, do you just choose the image-mask pair from SBD?

icoz69 commented 5 years ago

there is a split of SBD that adopts the same val set with voc 2012. thats the splits we use. you can simply combine all training images in two datasets.

SMHendryx commented 5 years ago

Thanks for the clarification @icoz69. I am closing this issue as it is more clear how to reproduce your work now.

icoz69 commented 5 years ago

you are welcome. good luck to your project