stevehuanghe / image_captioning

Image captioning models in PyTorch
Apache License 2.0
37 stars 10 forks source link

Problems with SCA-CNN #6

Open BrunoQin opened 5 years ago

BrunoQin commented 5 years ago

Hi, This is a very excellent work for me and I have read the core code of SCA-CNN model! Thank you a lot for providing the code in Pytorch. I have a question with the code following: when I train a model with 20 seq_max_len for example, this for will do 20 times, and are they share the same weights and feature maps? or if they share same weights and feature maps, they will have the same output? can you give me some advice on this? or the h and c of LSTM will change between for which will make output different? Thank you very much!

for t in range(seq_max_len):
            if self.att_mode == 'cs':
                beta = self.channel_attention(features, hidden)
                features = beta * features
                alpha = self.spatial_attention(features, hidden)
                feats = alpha * features
            elif self.att_mode == 'c':
                beta = self.channel_attention(features, hidden)
                feats = beta * features
            elif self.att_mode == 's':
                alpha = self.spatial_attention(features, hidden)
                feats = alpha * features
            else:
                alpha = self.spatial_attention(features, hidden)
                features = alpha * features
                beta = self.channel_attention(features, hidden)
                feats = beta * features
            feats = feats.view(1, batch_size, -1)
            embed = embeddings[t]
            inputs = torch.cat([embed, feats], dim=2)
            hidden, states = self.lstm(inputs, states)
            hidden = self.dropout(hidden)
            output = self.lstm_output(hidden)
            logits.append(output)
stevehuanghe commented 5 years ago

Yes they share the same LSTM so the weights are also shared. But as you have mentioned, h and c of lstm will change as the input changes, thus the output for each t will be different.

N-Kingsley commented 5 years ago

Hi,

I have a similar problem. If I add attention after every convolution layer, as the original paper says. Then, is the attention parameter of each layer shared?

BrunoQin commented 5 years ago

@N-Kingsley Hi, 1) I think the parameters of each layer in attention layers are the same, but the data you pass to attention layers is different, such as features and hidden, they are changed as forward, so the alpha and beta are changed. Or 2) you can create different attention layers for different convolution layers, so they won't share same parameter.

N-Kingsley commented 5 years ago

If the number of channels per layer is different, can parameters shared?

BrunoQin commented 5 years ago

@N-Kingsley Hi, I know what you mean, the code in this repo only adds one attention layer after decoder CNN, and he uses CNN and LSTM and for to predict words, in the for, the parameters of attention layer are shared, so the parameters are the same. But when you add more attention layers in your code, it will be different, and don't share.