YOLOV8 GRAD-CAM - Githubissues

ElfMerve commented 3 months ago

In the grad-cam code for Yolov8, why is the layer specified as 5 layers? In the code; 'layer': [10, 12, 14, 16, 18]) Normally it is processed according to the last convulusion layer. What are the meanings of these given layers? Why is more than one layer used?

z1069614715 commented 3 months ago

The effect of multiple layers will be better

ElfMerve commented 3 months ago

I have a few more questions I would appreciate your answers.

1) In our best.pt content for yolov8, the relevant layers ([10, 12, 14, 16, 18]) you have given are as follows. Is the content of layers [10, 12, 14, 16, 18] below the same as the content of layers [10, 12, 14, 16, 18] in your code? Can you share the model architecture of your best.pt file with us?

Here is the link to our complete best.pt yolov8 model architecture https://drive.google.com/file/d/1A4kYVWL_ziLb9GtFn6fSMpbllM9NaT51/view?usp=sharing

(10): Upsample(scale_factor=2.0, mode='nearest')

(12): C2f( (cv1): Conv( (conv): Conv2d(1280, 640, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(640, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) (cv2): Conv( (conv): Conv2d(1600, 640, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(640, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) (m): ModuleList( (0-2): 3 x Bottleneck( (cv1): Conv( (conv): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): BatchNorm2d(320, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) (cv2): Conv( (conv): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): BatchNorm2d(320, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) ) ) )

(14): Concat()

(16): Conv( (conv): Conv2d(320, 320, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn): BatchNorm2d(320, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) )

(18): C2f( (cv1): Conv( (conv): Conv2d(960, 640, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(640, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) (cv2): Conv( (conv): Conv2d(1600, 640, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(640, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) (m): ModuleList( (0-2): 3 x Bottleneck( (cv1): Conv( (conv): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): BatchNorm2d(320, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) (cv2): Conv( (conv): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): BatchNorm2d(320, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) ) ) )

2) Why did you not use layer 21, which is the top layer of the backbone classification layer of yolov8 from which we obtain 2d features?

(21): C2f( (cv1): Conv( (conv): Conv2d(1280, 640, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(640, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) (cv2): Conv( (conv): Conv2d(1600, 640, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(640, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) (m): ModuleList( (0-2): 3 x Bottleneck( (cv1): Conv( (conv): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): BatchNorm2d(320, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) (cv2): Conv( (conv): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): BatchNorm2d(320, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) ) ) )

atultiwari commented 3 months ago

I am also interested in understadning this.

z1069614715 / objectdetection_script

YOLOV8 GRAD-CAM #49