openseg-group / OCNet.pytorch

Please choose the openseg.pytorch project for the updated code that achieve SOTA on 6 benchmarks!
MIT License
812 stars 128 forks source link

Visualization of the learned Object Context #15

Open kikyou123 opened 6 years ago

kikyou123 commented 6 years ago

The visualization results are very good! Can you describe how to generate visualization results? If it is possible, can you release the code for visualization?

PkuRainBow commented 6 years ago

@kikyou123 We will consider releasing the related code in the future. It is easy for yourself to implement one.

winwinJJiang commented 6 years ago

Click one pixel use self-attention get the attention map regarding on the pixel you clicked. Then here you are.

KeyKy commented 6 years ago

could you show me some code or variable about the attention map (sim_map in base_oc_block.py) ? @winwinJJiang

xiaosean commented 5 years ago

Same question! How to pick a pixel then generate the attention map? Self-attention module's input is ResNet Feature map, so how to relate the input pixel?

wangbin-ict commented 5 years ago

Hi, @PkuRainBow. Thanks for releasing the code. When I got the visualization result, it‘s very different from figure 1 in your paper. So I want to know some details about Figure 1.

The following code comes from base_oc_block.py ` def forward(self, x): batch_size, h, w = x.size(0), x.size(2), x.size(3) if self.scale > 1: x = self.pool(x)

    value = self.f_value(x).view(batch_size, self.value_channels, -1)
    value = value.permute(0, 2, 1)
    query = self.f_query(x).view(batch_size, self.key_channels, -1)
    query = query.permute(0, 2, 1)
    key = self.f_key(x).view(batch_size, self.key_channels, -1)

    sim_map = torch.matmul(query, key)
    sim_map = (self.key_channels**-.5) * sim_map
    sim_map = F.softmax(sim_map, dim=-1)

    context = torch.matmul(sim_map, value)
    context = context.permute(0, 2, 1).contiguous()
    context = context.view(batch_size, self.value_channels, *x.size()[2:])
    context = self.W(context))`

(1) Whether the third column in Fig.1 is corresponding to the visualization of "sim_map" on L86 (I used this one)? or the "context" on L91, or the semantic feature map at a certain semantic channel. (2) How to normalize each object context feature [dim:H*W] before visualization? (x-min)/(max-min) for each pixel x?

PkuRainBow commented 5 years ago

In our implementation, we normalize both the query and key before visualizing the sim_map.

I think the max, min normalization should also work.

PangYunsheng8 commented 5 years ago

Can you offer your visualization code? I want to learn it, thank you !

1451154 commented 4 years ago

Can you offer your visualization code? I want to learn it, thank you !

dongzhang89 commented 3 years ago

Same question!