jacobgil / keras-cam

Keras implementation of class activation mapping
335 stars 106 forks source link

Problem about the dense layer #13

Open mammadjv opened 5 years ago

mammadjv commented 5 years ago

I had a problem with the approach which is clear in this line: https://github.com/jacobgil/keras-cam/blob/2b7ada2c5c819808a174444e82622315a07fa11e/model.py#L59 As it is mentioned in the paper, trained weights in this layer are used for a weighted sum over the last produced activation maps. For predicting a non-linear function and class score in an MLP, there should be at least two layers (one hidden layer and an output layer like Softmax). But here, right after the GAP layer, only one FC layer with two units is added for classification. Can anyone explain the reason? And why the number of units is 2?

AngusMaiden commented 1 year ago

@mammadjv,

This is now a quite old comment but I'll answer it anyway and hope it helps you or someone else reading.

The Keras Dense layer takes an activation parameter which is a shortcut for adding another activation layer. The implementation above is essentially two layers, a hidden layer and a Softmax output layer. Keras provides this convenience because you rarely need to change any of the attributes of the activation layer, except for what type of activation it is. In practicality, we don't normally think of the activation layer as a separate layer, it is often considered part of the layer before it, thus this Keras Dense layer is the output layer, with a softmax activation. Keras allows for a number of different activations such as 'relu', 'tanh', 'sigmoid', 'softmax', etc. in the activation argument of the Dense layer, so when implementing the final layer you can simply choose softmax as the activation function and you have your output layer.

The reason why the number of outputs is 2, is because the model is classifying two classes: person and non-person as described in the README. This architecture is used with a one-hot encoding type of labelling, i.e. the label {1,0} = non-person, and label {0,1} = person. The size of the label vector is 2 which matches the number of output nodes. Note that the developer could have instead used 1 output node with a sigmoid activation on the final layer to achieve the same thing, whereby outputs in the range [0,0.5) = non-person, and outputs in the range [0.5,1] = person. Here the corresponding labels would have to be a binary scalar i.e. 0 = non-person, 1 = person. However, the math all works out the same so this is is more or less an arbitrary choice.

I'm not sure from your question if you were wondering about this as well, but in addition, there is no need for any other hidden layers after the GAP as GAP can be used to replace any FC layers after the convolution layers, instead connecting straight from the GAP to the output layer with softmax activation. See this page for more details: https://paperswithcode.com/method/global-average-pooling.