Closed deepAICrazy closed 3 years ago
It is not all readers' duty for varifying your code if your paper is under review because a totally contradictory finding has been shown in other papers and comments. It will be better if you can show the results of COCO and PASCAL evaluated with the original aspect ratios and more recently used sizes like 473 and the original sizes before editing and closing my issue.
I interpret your claim to mean that our experimental setup is NOT EXACTLY THE SAME as PFENet and RePRI, but it does not directly make our experiments unfair comparisons. In fact, the differences in our experimental setup make the comparisons disadvantageous to ours, while being favorable to the others for three main reasons:
First, ABSOLUTELY NO DATA AUGMENTATION METHODS are used in our method. Note that almost every method that we compare in Tables 1-3 of our paper adopts many different kinds of data augentation techniques such as RandomScale, RandomCrop, RandomRotate, RandomVerticalFlip, RandomHorizontal, and RandomGaussianBlur whereas our method use NONE of them. Even with such little training data, our method significantly outperforms with large margins: 1~6%p mIoU improvements (1-shot) on all the three datasets, clearly demonstrating the superiority of the method.
Although we were aware of the fact that larger image sizes (> 400) typically yield better mIoU results, we used image size of 400 as it's nicely, recursively divided by factor of 2. For hyperparamters, we chose simple numbers as per the design principle we pursue: 'simplicity'.
We barely engineered our model & code, and tried to keep the number of hyperparameters as minimal as possible. Note that in order to bring extra performance gain, many other existing methods use additional engineerings to the models (other than the main methodology they propose). The engineerings incude auxillary training objectives, learning rate/weight decay, superpixels, pseudo labels, and etc. All of these additional modules & engineerings apparently result in additional hyperparmeters to tune (which is burdensome) with little mIoU improvements. We could have added such extra engineerings to bring extra performance improvement but eventually we decided not to. This is because another design principle we pursue is 'minimal dependency'. Also, we have tried our best to provide easily-readable & readily-runnable code by meticulously refactoring the code with very little amount of hyperparameters & arguments. (You can easily notice this when you actually compare our code with others.)
See below reply for more details on these points I made, and the results you requested.
Below we provide our evaluation results on PASCAL-5i and COCO-20i using original image size (with ResNet101 backbone). The superscript 'org' denotes our model evaluated using original image size.
PASCAL-5i:
COCO-20i:
We updated our repository so you can reproduce results above. To reproduce the results with original image size, append additional argument '--use_original_imgsize' as below
python test.py '...other arguments...' --use_original_imgsize
The evaluation results is comparably effective to those from our original setup (evaluation with image size of 400x400) with very slight mIoU drop (0.2%p on PASCAL-5i and 0.1%p on COCO-20i), but it still sets a new state of the art with a large margin (when compared to the other methods [4, 35, 43, 67, 71]). We suspect this slight degradation is because training and testing condition do not match where we train our model with image size of 400x400 and tested with original image size and aspect ratio. We found that such mismatch of training/testing setups resulted in performance drop when evaluated image size of 417 and 473, which implies such issues can be alleviated when trained with the same image size used in testing.
Answer: Indeed, we could've used the setting of the best results. However, given such large performance improvements on all three datasets (~6%p, ~3%p, and ~1%p mIoU improvements on PASCAL-5i, COCO-20i, and FSS-1000 respectively in 1-shot setting && ~5%p, ~7%p, and 0.4%p mIoU improvements on PASCAL-5i, COCO-20i, and FSS-1000 respectively in 5-shot setting), we were satisfied with current experimental setup which doesn't even use any data augmentation techniques; note that most previous methods such as PFENet (https://github.com/Jia-Research-Lab/PFENet/blob/master/util/transform.py) and RePRI (https://github.com/mboudiaf/RePRI-for-Few-Shot-Segmentation/blob/master/src/dataset/transform.py) adopts many different kinds of data augentation such as RandomScale, RandomCrop, RandomRotate, RandomVerticalFlip, RandomHorizontal, RandomGaussianBlur. Our model, however, does not use any of these as seen in our code: https://github.com/juhongm999/hpnet-dev/blob/master/data/dataset.py.
Moreover, instead using image size of 417 or 473, our model takes image size of 400 for two main reasons:
We hope our answers helped.
We've added & updated our comments. Do they address your concerns?
https://github.com/juhongm999/hsnet/issues/5
This is not the case actually because you did not fairly compare your results with REPRI and PFENet in the table1 where the results are directly copied from their papers. In https://github.com/mboudiaf/RePRI-for-Few-Shot-Segmentation/blob/master/src/dataset/transform.py#L80 they keep the aspect ratio of the resized images to be the same as the original image, but in your implementation https://github.com/juhongm999/hsnet/blob/e288916debe5290b3e9554fb61e13a474e00f885/data/dataset.py#L25 the images are simply resized to be the aspect ratio 1:1 without keeping the original label.
For my second question, now that all previous methods use 417, 473 or the original sizes for evaluating COCO and PASCAL. I do not understand why did you use size 400 on COCO and PASCAL to create a brand **new** setting and make other people hard to follow to have a fair comparison, even if the size 400 does not bring the best performance according to your words. Normally we should show the setting of the best results. This is true the performance on Pascal will be slightly higher when the training size grows but they are still comparable. And the models of REPRI and PASCAL cannot be directly tested with 1:1 aspect ratio because they are not trained with the images the 1:1 ratios in the non-255 regions. However on COCO I have tested on PFENet. the results will be much lower when it is evaluated with the original labels without resize. it is the same to the results shownin PFENet it is also mentioned by https://github.com/juhongm999/hsnet/issues/1#issuecomment-816485819. So I think it is unfair if you cannot show the COCO results with the original aspects and the original sizes (or 417, 473) to compare with related methods (REPRI, PFENET AND ASGNet and so on) because a smaller size for resizing labels does bring much better performance on COCO.