Closed housemountainer closed 10 months ago
Hey thanks for your kind words. This is a good question, it made me realise one extra thing about this project. I shall add an update for it and let you know more about this soon. For starters you can also try using print(model.model) for understanding the structure.
To extract the specific layer within the model object, you will have to loop across its Sequential Block and extract the exact layer you need. You can then use it to generate the heatmap for it.
For example to extract the last Conv layer, you can pass it like this:
from torch.nn import Conv2d
c2f_module = model.model.model[-1]
tgt_layer = []
# # Iterate through the sub-modules of C2f
for sub_module in c2f_module.modules():
if isinstance(sub_module, Conv2d):
if sub_module.in_channels == 80 and sub_module.out_channels == 80 and sub_module.kernel_size == (1, 1) and sub_module.stride == (1, 1):
tgt_layer = sub_module
print(tgt_layer)
You can then pass this extracted layer stored in tgt_layer as a list:
target_layers = [tgt_layer]
cam = EigenCAM(model, target_layers, use_cuda=False)
grayscale_cam = cam(rgb_img)[0, :, :]
cam_image = show_cam_on_image(img, grayscale_cam, use_rgb=True)
Image.fromarray(cam_image)
Rest of the steps remain same. The results may not be as you expect though with regards to the generated heatmaps. You can try experimenting with different layers and see the outputs.
Hello, thank you for the great work and implementation!
Edit: I am using a classification model.
I am not really experienced myself, but I noticed that there was a lot of unclear information about which layer to use for EigenCAM in Yolov8. As I understood the paper on EigenCAM, the method is meant to be applied on the very last convolutional layer in the network. This would not be -2, or -4, so many pointed out, but model.model.model[-1].conv I think.
Please correct me if I am mistaken. I might very clearly be mistaken, as Yolo as a model-architecture is not straight forward.
Story: I was really frustrated about finding the right layer, as -2, -3,-4 all performed different in each image, and also for each model-size, so I looked into the yolo architecture and found out, that yolo actually has a conv-network in the classify-layer.
Thank you all and I am very interested in some different views on this.