Open ElfMerve opened 3 months ago
The effect of multiple layers will be better
I have a few more questions I would appreciate your answers.
1) In our best.pt content for yolov8, the relevant layers ([10, 12, 14, 16, 18]) you have given are as follows. Is the content of layers [10, 12, 14, 16, 18] below the same as the content of layers [10, 12, 14, 16, 18] in your code? Can you share the model architecture of your best.pt file with us?
Here is the link to our complete best.pt yolov8 model architecture https://drive.google.com/file/d/1A4kYVWL_ziLb9GtFn6fSMpbllM9NaT51/view?usp=sharing
(10): Upsample(scale_factor=2.0, mode='nearest')
(12): C2f( (cv1): Conv( (conv): Conv2d(1280, 640, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(640, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) (cv2): Conv( (conv): Conv2d(1600, 640, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(640, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) (m): ModuleList( (0-2): 3 x Bottleneck( (cv1): Conv( (conv): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): BatchNorm2d(320, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) (cv2): Conv( (conv): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): BatchNorm2d(320, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) ) ) )
(14): Concat()
(16): Conv( (conv): Conv2d(320, 320, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn): BatchNorm2d(320, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) )
(18): C2f( (cv1): Conv( (conv): Conv2d(960, 640, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(640, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) (cv2): Conv( (conv): Conv2d(1600, 640, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(640, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) (m): ModuleList( (0-2): 3 x Bottleneck( (cv1): Conv( (conv): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): BatchNorm2d(320, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) (cv2): Conv( (conv): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): BatchNorm2d(320, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) ) ) )
2) Why did you not use layer 21, which is the top layer of the backbone classification layer of yolov8 from which we obtain 2d features?
(21): C2f( (cv1): Conv( (conv): Conv2d(1280, 640, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(640, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) (cv2): Conv( (conv): Conv2d(1600, 640, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(640, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) (m): ModuleList( (0-2): 3 x Bottleneck( (cv1): Conv( (conv): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): BatchNorm2d(320, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) (cv2): Conv( (conv): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): BatchNorm2d(320, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) ) ) )
I am also interested in understadning this.
In the grad-cam code for Yolov8, why is the layer specified as 5 layers? In the code; 'layer': [10, 12, 14, 16, 18]) Normally it is processed according to the last convulusion layer. What are the meanings of these given layers? Why is more than one layer used?