Open srwi opened 2 years ago
Hi @stnkl ,Thank you so much for your great PR! I'm going to approve and merge this PR because, I believe basically, your idea makes a lot of sense!
However, before that, please improve some points.
First, tf-keras-vis
supports N-dim image inputs, so GradCAM should support Conv1D
, Conv2D
, Conv3D
or more dimensions.
And the target feature (cam) shape doesn't have to be square.
So, I think, the condition code will be improved, for example, like below:
def _penultimate_layer_condition(layer):
return len(layer.output_shape) > 2 and any(d > 1 for d in layer.output.shape[1:-1])
In addition, is it possible to also keep the condition of Conv
(i.e., lambda l: isinstance(l, Conv)
)?
If NOT, for example, in VGG16, the CAM will be generated from the feature-map that is output of max-pooling layer (block5_pool whose output shape is (None, 7, 7, 512)
), despite expecting output of before that (block5_conv3 whose output shape is (None, 14, 14, 512)
). So I assume that this will be a compatibility issue with lower versions.
Thanks!
Thanks for having a look at the PR! You brought up some very valid points.
First of all I totally agree with your updated search condition. This would make it compatible with Conv1D
, Conv2D
and Conv3D
. I don't see any issue there.
Secondly, I see the problem with VGG16 and you are correct that it should select the Conv
layer here. However if we do keep this Conv
layer condition this would again break the behaviour for MobileNetV3 as the subsequent layers with matching dimensions are actually necessary to get correct GradCAM images.
Thus, I could imagine one further method of selecting the target layer:
Conv
layer l
that matches _penultimate_layer_condition
l
and use it as the target layer.This would however make the target layer search a little more implicit. Again, let me know what you think and I would be happy to implement!
Hi there!
Currently the last
Conv
layer is being automatically used for GradCAM if not specified differently. We noticed that for some models like MobileNetV3 this results in the wrong layer being used (as already mentioned in #61).Specifically to MobileNetV3 this has some problems:
Conv2D
layer causing this layer with shape(None, 1, 1, 1024)
to be selected as the penultimate layer. Obviously this will result in incorrect/useless GradCAM images.Conv
layer manually however does not include some important activations which occasionally causes inverted GradCAM images (see #61).We propose to use a different search condition which searches for the last layer with four dimensions and a width and height of more than 1. This will help both problems mentioned above:
Conv
layer anymore resulting in non-inverted GradCAM images.A similar implementation is being used in sicara/tf-explain. Their implementation would however still be affected by the first problem.
I added some quick tests which could be extended in the future.
Let me know what you think!