jiwoon-ahn / irn

Weakly Supervised Learning of Instance Segmentation with Inter-pixel Relations, CVPR 2019 (Oral)
MIT License
519 stars 100 forks source link

Performance Gap and Hyper-parameter Settings #4

Closed XiaoyanLi1 closed 4 years ago

XiaoyanLi1 commented 5 years ago

Hi Jiwoon Ahn, Your paper is very good and I'm really interested in it. I've already tried your code, but I cannot achieve the same performace as the paper. Would you please help me figure out where the problem is?

In my experiments, the learning rates of both CAM and IRN are set to 0.1, while other hyper-parameters follow the default setting in rum_sample.py. My performance are as following, model task my exp. reported
CAM semantic segmentation 48.1 48.3
IRN semantic segmentation 64.9 66.5
IRN instance segmentation 32.4 37.7

The CAM models have similar performace, but there are performance gaps between IRN models in both task.

There may be two possible reasons for the gap.

  1. I notice the hyper-parameter settings in the paper and the code are not exactly the same. The exp_times is set to 8 in the code, while in the paper it is set to 256 (which also does not work in my case).
  2. Anthor possible problem is that multiscale testing is only used in CAM, but not in IRN.

Would you please point out the differences between my experiments and yours that may results in the gap? Thank you!

djiajunustc commented 5 years ago

Hi @XiaoyanLi1 ,
I get 36.1 mAP for IOU=0.5, and my hyper-parameter setting is: cam_batch_size = 32 cam_batch_size = 0.1 irn_batch_size = 16 irn_learning_rate = 0.1

Maybe with different random seed or different batch_size/learning_rate, we can approach the reported performance.

jiwoon-ahn commented 5 years ago

Hi @XiaoyanLi1,

You should adjust the learning rates according to the batch sizes. Thanks @djiajunustc for the comment! The number of random walks iterations is computed as 2^{exp_times}. Hence, setting exp_times=8 means 256 iterations. Check to_transition_matrix in misc/indexing.py.

Here's what I've just got with

And I'm also suspecting skimage.transform.rescale in voc12/dataloader.py has some glitches when using torchvision resnet. I'm gonna run a few tests about this and update the repository.

XiaoyanLi1 commented 5 years ago

@jiwoon-ahn @djiajunustc Thanks a lot for your comments. I'll try these settings!

djiajunustc commented 5 years ago

Hi, I tried to adopt pseudo label as GT and train an end-to-end Mask R-CNN. But I failed to get comparable results as yours. Could you provide some more details such as input size, learning rate, batch size, max iteration ?

Thanks!

djiajunustc commented 5 years ago

Besides, whether 'train' set or 'train_aug ' set is supposed to be adopted?

jiwoon-ahn commented 5 years ago

@djiajunustc, I have just updated the code: Pillow resizing function is now used when loading data. Please try again with the same setting. The performance may be slightly different (<1.5%) every time you run the code depending on the quality of CAMs. Use 'train' for evaluating the quality of the pseudo labels as there does not exist any ground-truth of 'train_aug'.

djiajunustc commented 5 years ago

@jiwoon-ahn Thanks! But we can also generate pseudo label on 'train_aug' to train the Mask R-CNN?

jiwoon-ahn commented 5 years ago

@djiajunustc, Yes, I actually trained Mask R-CNN with the pseudo labels on 'train_aug' for the reported results. I'll add that to README.md later.