Implementing GradCam in FCN Segmentation Model

varungupta31 commented 2 years ago

Hi @ismailuddin Can you help me out with implementing GradCam in FCN segmentation model.

Any help is appreciated, Thanks.

ismailuddin commented 2 years ago

Hi @varungupta31, I could try to help you out but it appears your implementation is using TensorFlow 1.X. Have you considered using TensorFlow 2.X? Otherwise, what issues are you encountering?

varungupta31 commented 2 years ago

Thank you for replying.

it appears your implementation is using TensorFlow 1.X

It was actually written for TF 0.11, however with some changes, I got it working to TF 1.x

Have you considered using TensorFlow 2.X?

No I did not. I was trying to mimic the results of a previous study using the code on my custom dataset. I have the trained model (.meta .index .data checkpoint files). My results are promising, but for explainablity I turned to GradCam.

Otherwise, what issues are you encountering?

After much search, I see that GradCam is being implemented in Keras models (which return Keras.functional object, and keras tensors). I'm actually stuck at how to even begin writing the logic using my non-keras code. Upon reading your (very well explained :) ) notebook, I have got the logic flow, i.e., we begin by extracting the last conv layer, then build a model upto that layer, and another one which does the calculations after the last layer, and then compute the gradients.

How can I do these (create a model upto the last layer, find out the gradients) in tensorflow?
What changes are to be made to get this working on a segmentation model (where every pixel is being classified)?

I'd request to please give me some pointers on how I can proceed and get this implemented. I hope this doesn't take too much of your time,

Thank you.

ismailuddin commented 2 years ago

Hi @varungupta31,

So before delving into the technical details, I think it's worth considering what's the best way to move forward. Considering your model is already semantically segmenting your image, the simpler approach of GradCam (such as cell #13 in my notebook) would probably not add much value. That approach is simply localising the object within the image (or more correctly identifying the strongest pixel contributors to the final classification), which is fairly similar in principle to semantic segmentation. So I think what you really want is the guided Grad-CAM that produces a high resolution saliency map (described in the very last cells of my notebook). I think this makes sense because the saliency map will then (hopefully!) show you which pixels within your segmented image contributed to the classification of this segmentation (as opposed to the simpler Grad-CAM which shows which regions contributed to the classification, which in the case of a segmentation model is precisely the output of the model anyways!).

Coming back to how to implement the logic with non-Keras code:

Since you have manually coded the model with TF 0.X/1.X operations, all you would need to do is simply have another function that creates a model calling the same operations, but simply manually omitting the last layer so your return from the model isn't the final output, but one of the last few convolutional layers. The gradients part will be little tricky (especially since I believe you want the guided Grad-CAM high res saliency map). In the simpler Grad-CAM implementation, those gradients are essentially what you are computing in line 135 of your FCN.py file. The problem is you need a slightly modified gradient function for the saliency map, which is what I'm doing in cell #35 of my notebook. I'm afraid I'm not too sure how you could do this in TF 0.X/1.X :/
The changes aren't really all that different tbh, an FCN is very similar to an ordinary image classification CNN model, as you may have noticed (at least in the initial / bottom layers). I guess the challenge is to select the appropriate CNN layer in your model. In an image classification model, its easy to know which layer to select, the one just before everything is reduced to a Dense layer. In this FCN, I imagine you might want to try a few of the different CNN layers at the end.

I hope this sort of helps you! I'd also suggest trying to implement this in TF2, as the syntax is a lot simpler to understand, allowing you to focus more on the logic and less on the specifics of the platform to get given operations to run...

varungupta31 commented 2 years ago

THANK YOU SO MUCH.

I got your intuition to use guided grad cam, and will definitely learn more about it and prioritize it.

The problem is you need a slightly modified gradient function for the saliency map, which is what I'm doing in cell #35 of my notebook.

Anything you'd recommend (maybe some equation) I should look to get an idea on how I can write the function for such a custom gradient?

I'd also suggest trying to implement this in TF2, as the syntax is a lot simpler to understand, allowing you to focus more on the logic and less on the specifics of the platform to get given operations to run...

What I'm planning at the moment, is to recreate the model in Keras. (There are implementation of FCN in Keras available but those after training are yielding unexpected results, probably because of some architectural differences). I'm new to TF/Keras, so it will a bit of a challenge for me. :)

If you're interested, some interesting insights regarding Grad-Cam in Semantic Segmentation are discussed in this notebook.

I hope this sort of helps you!

It definitely does. Thank you for taking out the time to write this :)

ismailuddin / gradcam-tensorflow-2

Implementing GradCam in FCN Segmentation Model #1