kaiyuyue / cgnl-network.pytorch

Compact Generalized Non-local Network (NIPS 2018)
https://arxiv.org/abs/1810.13125
MIT License
259 stars 41 forks source link

Heat-map Visualization #7

Closed hasirk closed 5 years ago

hasirk commented 5 years ago

How to generate heat map visualization for video frames as presented in the paper Figure 6 ?

Thank you for your support.

kaiyuyue commented 5 years ago

Hi @hasirk , apologize for the delayed answer. I try to find that part of code for visualization. But it's gone, I will illustrate it as much as I can.

How to visualize the attention maps in the CGNL and NL network?

The dimension of the attention tensor in CGNL / NL module is N x N, where N = CTHW for CGNL network and N = THW for NL network in video-based task, N = CHW for CGNL network and N = HW for NL network in image-based task.

First, one point should be given to let us find its high-related points in the other areas of an image or frames. So select the location O(x, y) you think it's important like a pixel in the object of a ball or a hand. Then scale its coOrdinates down into the location according to the size scale between input image and feature map of CGNL / NL module, for example O(x, y) -> P(x/16, y/16).

In image-based task, to NL module, N = HW x HW, so the heat map N.view(H, W, H, W)[x/16, y/16, :, :] in size of torch.Size(H, W) is what we would like to visualize. It's just one map, so this is why there is only one column in Figure.5 for NL case. To CGNL module, N = CHW x CHW, the heat maps N.view(C, H, W, H, W)[c, x/16, y/16, :, :], where c is selected from C are what we want to visualize, so there are multiple columns in Figure.5 for CGNL cases.

In video-based task, to NL module, N = THW x THW, so the heat maps N.view(T, H, W, T, H, W)[t1, x/16, y/16, t2, :, :] is to be visualized, where t1, t2 is selected by yourself. To the CGNL module, N = CTHW x CTHW, the heat maps N.view(c1, t1, x/16, y/16, c2, t2, :, :), same c1, t1, c2, t2 is selected from C, T by yourself.

In the end, resize the heat maps back to the input size and use a threshold (=0.7) to visualize what the high-related points looks like.

hasirk commented 5 years ago

@KaiyuYue Thank you very much for the detailed explanation. You have made it very clear.

I will try to implement this. Thanks again.

I have just additional question, in this repo you have only given the code for single image case. Do you plan to release the video classification code as well ?

kaiyuyue commented 5 years ago

In this year, no. I'm too busy that have no time to reproduce the results for video tasks. But some works which cite CGNL work contains the reproduced results used to compare on some benchmarks like UCF101. So I think if code is written well, the CGNL module should work well for video task. I list the important tips to revise the code before, hope it will make you well.

hasirk commented 5 years ago

Thank you very much for the tips. I will check non-local repo as well because they only have the code for video case.

Thanks.

LiuLifu13 commented 1 year ago

To CGNL module, N = CHW x CHW,where the att.shape is [b,1,1],how did you calculate the N?