openseg-group / OCNet.pytorch

Please choose the openseg.pytorch project for the updated code that achieve SOTA on 6 benchmarks!
MIT License
812 stars 128 forks source link

The learned object context is only interpretable when the oc module is placed at last before classifier #42

Closed yaoqi-zd closed 5 years ago

yaoqi-zd commented 5 years ago

Hi, I have run the base_oc network and visualize the learned object context, it indeed highlights the pixels of the same category as the query pixel. However, when I tried to place the base_oc_block in the middle of the network, the performance is close, but the learned object context is somewhat uninterpretable, for example, when I click the pixel on car, it may highlight the pixels on the road. Do you have any idea about such phenomenon?

PkuRainBow commented 5 years ago

Very interesting phenomenon.

I am wondering whether the pixel is on the boundary region between the car and the road?

yaoqi-zd commented 5 years ago

sorry, I take Cityscape as example but actually I conduct my experiments on VOC dataset due to lack of computation resource. with base_oc_block placed in the middle of the network, even when I click one pixel in the middle of the road, the visualized object context may still highlight the car. The visualization results seem random and uninterpretable. you may conduct ablation experiments on Cityscape if you have time, I'm not sure whether the dataset matters

PkuRainBow commented 5 years ago

We have not tried to use the base-oc-block in the middle of the network due to its complexity.

I am wondering whether the visualization of object context map when using in the end of the network looks fine?

yaoqi-zd commented 5 years ago

Yes, the visualization result look fine when base-oc-block is placed in the end. Further, you said in zhihu share that using the gt derived object context greatly boost the performance, may I ask the detailed steps. Did you downsample the gt and compute the object context (or called similarity map) by outer product (same label lead to 1, otherwise 0), then use this to aggregate features from other pixels?

PkuRainBow commented 5 years ago

Yes, you can try to use the gt derived object context within base-oc-block.

Here we share the code fragment as below,

        label = F.upsample(input=label.unsqueeze(1).type(torch.cuda.FloatTensor), size=(h, w), mode='nearest')
        label_row_vec = label.view(batch_size, 1, -1).expand(batch_size, h * w, h * w)
        label_col_vec = label_row_vec.permute(0, 2, 1)
        pair_label = label_col_vec.eq(label_row_vec)
        sim_map = F.normalize(pair_label.type(torch.cuda.FloatTensor), p=1, dim=2)
yaoqi-zd commented 5 years ago

Thanks for your sharing!