pengzhiliang / Conformer

Official code for Conformer: Local Features Coupling Global Representations for Visual Recognition
Apache License 2.0
531 stars 87 forks source link

feature maps in Figure 1 #10

Closed eeric closed 3 years ago

eeric commented 3 years ago

hi, author which could you use tools to your drawing feature maps ? was it that link: https://github.com/utkuozbulak/pytorch-cnn-visualizations?

pengzhiliang commented 3 years ago

Emm,matplotlib yyds.

eeric commented 3 years ago

good work Attention maps in Figure 4 could you provide drawing example?

eeric commented 3 years ago

your algorithm was similiar with mobile-former

pengzhiliang commented 3 years ago

I need to look for the visualization code, I may not be able to find it. In fact, mobile-former is half a year late for us.

pengzhiliang commented 3 years ago

We submitted to ICCV21 in March and arxiv in May. Mobile-former was put on arxiv in August, so it is hard to believe that they did not learn from it.

eeric commented 3 years ago

oh, I appreciate your work.

eeric commented 3 years ago

but transformer was hardly training, eg., the loss fell very slowly.

pengzhiliang commented 3 years ago

The part of visualization code is shown as following:

# first, you should return all attention matrix in self-attention model (12 stages), and then stack them.
att_mat = torch.stack(att_mat).squeeze(1)
att_mat = torch.mean(att_mat, dim=1)

# To account for residual connections, then add an identity matrix to the attention matrix and re-normalize the weights.
residual_att = torch.eye(att_mat.size(1))
aug_att_mat = att_mat + residual_att
aug_att_mat = aug_att_mat / aug_att_mat.sum(dim=-1).unsqueeze(-1)

# Recursively multiply the weight matrices
joint_attentions = torch.zeros(aug_att_mat.size())
joint_attentions[0] = aug_att_mat[0]

for n in range(1, aug_att_mat.size(0)):
    joint_attentions[n] = torch.matmul(aug_att_mat[n], joint_attentions[n-1])

# Attention from the output token to the input space.
v = joint_attentions[-1]
grid_size = int(np.sqrt(aug_att_mat.size(-1)))
mask = v[0, 1:].reshape(grid_size, grid_size).detach().numpy()
mask = cv2.resize(mask / mask.max(), im.size)[..., np.newaxis]
result = (mask * im).astype("uint8")
pengzhiliang commented 3 years ago

Maybe something went wrong

but transformer was hardly training, eg., the loss fell very slowly.

eeric commented 3 years ago

nice, exciting work, good person!

xingshulicc commented 3 years ago

Hi, I agree with @eeric, your algorithm is very similar to mobile-former: the strucutre of branches are the same with yours. The only difference is: they used MobileNet-V3 as CNN backbone, and you adopted ResNet.

pengzhiliang commented 3 years ago

Hi, @xingshulicc it shows that our research is valuable, hh.

xingshulicc commented 3 years ago

yes, it is really good work.

在 2021年8月24日星期二,Zhiliang Peng @.***> 写道:

Hi, @xingshulicc https://github.com/xingshulicc it shows that our research is valuable, hh.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pengzhiliang/Conformer/issues/10#issuecomment-904486636, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHAI7OAEMD3HLAS7MV332T3T6NSD3ANCNFSM5CSONIIA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

pengzhiliang commented 3 years ago

Thank u for your kind words @xingshulicc