Closed ZwNSW closed 3 years ago
@glenn-jocher Dear author, when I am doing ablation experiments on your yolov5, I want to remove the Focus module, and then analyze the impact of yolov5 on the training results without the Focus module. But I found that I couldn't finish it, so I kindly ask for your help and point out how I should modify the code to complete my own ideas. Your professional answer is my motivation to study yolov5 hard. Thank you!
@ZwNSW architecture modifications can be attempted on the model yaml files in models/.
@glenn-jocher According to your steps, I commented out the Focus module in the backbone in the yolov5l.yaml file, as shown below: “# YOLOv5 backbone backbone:
[[-1, 1, Conv, [128, 3, 2]], # 1-P2/4 [-1, 3, BottleneckCSP, [128]], [-1, 1, Conv, [256, 3, 2]], # 3-P3/8 [-1, 9, BottleneckCSP, [256]], [-1, 1, Conv, [512, 3, 2]], # 5-P4/16 [-1, 9, BottleneckCSP, [512]], [-1, 1, Conv, [1024, 3, 2]], # 7-P5/32 [-1, 1, SPP, [1024, [5, 9, 13]]], [-1, 3, BottleneckCSP, [1024, False]], # 9 ]”
But an error occurred when running the yolo.py file. The error is as follows: “Using CUDA device0 _CudaDeviceProperties(name='GeForce RTX 2070 SUPER', total_memory=8192MB)
from n params module arguments
0 -1 1 3712 models.common.Conv [3, 128, 3, 2]
1 -1 1 161152 models.common.BottleneckCSP [128, 128, 3]
2 -1 1 295424 models.common.Conv [128, 256, 3, 2]
3 -1 1 1627904 models.common.BottleneckCSP [256, 256, 9]
4 -1 1 1180672 models.common.Conv [256, 512, 3, 2]
5 -1 1 6499840 models.common.BottleneckCSP [512, 512, 9]
6 -1 1 4720640 models.common.Conv [512, 1024, 3, 2]
7 -1 1 2624512 models.common.SPP [1024, 1024, [5, 9, 13]]
8 -1 1 10234880 models.common.BottleneckCSP [1024, 1024, 3, False]
9 -1 1 525312 models.common.Conv [1024, 512, 1, 1]
10 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
11 [-1, 6] 1 0 models.common.Concat [1]
12 -1 1 3085824 models.common.BottleneckCSP [1536, 512, 3, False]
13 -1 1 131584 models.common.Conv [512, 256, 1, 1]
14 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
15 [-1, 4] 1 0 models.common.Concat [1]
16 -1 1 772864 models.common.BottleneckCSP [768, 256, 3, False]
17 -1 1 590336 models.common.Conv [256, 256, 3, 2]
18 [-1, 14] 1 0 models.common.Concat [1]
19 -1 1 2561536 models.common.BottleneckCSP [512, 512, 3, False]
20 -1 1 2360320 models.common.Conv [512, 512, 3, 2]
21 [-1, 10] 1 0 models.common.Concat [1]
22 -1 1 10234880 models.common.BottleneckCSP [1024, 1024, 3, False]
Traceback (most recent call last):
File "D:/Myself_Program/yolov5_master_ablation/models/yolo.py", line 271, in
@glenn-jocher Can you guide me and help me complete my idea? I want to remove the Focus module. Thank you very much for your careful guidance and look forward to your reply!
@ZwNSW if you modify the architecture you must also naturally adapt the rest of the architecture to compensate. The absolute from
channels will no longer point to the same layers as before if layers are removed from the model.
@glenn-jocher Thanks your reply,I have found the method to solve this problem.
You might notice that the Focus module and the following 3x3 convolution, together, are exactly equivalent to a single 2D convolution of 6x6 with stride 2. In fact it's unclear why define such a Focus module at all. The pre-trained weights, biases etc of the 3x3 convolution can be converted to a 6x6 stride 2 convolution and the exact same forward inference results are obtained.
In other words if you're doing your own training, you can remove Focus and modify the first convolution to be size 6 stride 2 and get the exact same network as with Focus. (This doesn't directly answer your question, which you appear to have solved.)
@egbit I think you are confusing the number of parameters in a module with what the module does. Equal parameter counts do not imply identical functionality. Your proposed operation will not produce the results produced by Focus().
@glenn-jocher This was not a guess. It was actually done and coded and demonstrated to be completely equivalent. It's easier to see with a diagram how they're identical, I'm not sure how to include images here.
Perhaps one way to describe it in prose is to see that the Focus layer takes every 4 adjacent pixels in a square, and produces a single pixel with 4 times the number of channels (12 instead of 3). Using a 6x6 kernel Conv2D with stride 2 (and pad 2) does the same thing: it does a summation over 3 x 3 x (the same 4 adjacent pixels in a square), that is, over the exact same pixels of the original image. Converting a pre-trained model entails careful re-laying out the weights of the 3x3 convolution for the 6x6 convolution -- it's a sort of transpose. Once properly done it's easy to verify the output of that 6x6 Conv2D is exactly the same as from the Focus + 3x3 Conv2D. If custom training a model from scratch, none of that matters, everything simply works.
@egbit this is incorrect. If you've swapped a Focus() layer for your proposed alternative your performance will suffer significantly.
We are always looking for ways to improve, so we are open to new ideas, but this one is not a good candidate.
Apologies for not being quite clear. The 6x6 Conv replaces the combination of Focus and the following 3x3 Conv, not Focus alone. The difference in performance is probably not very large. The use of 6x6 was for edge inferencing where Focus was not efficient.
Oops, the Focus module does contain the 3x3 Conv. That's really odd, effectively it looks like the GPU you're using hasn't optimized 6x6 Conv very well, and Focus results in a memory pattern that makes it run much faster. There are environments where Conv is duly optimized and the Focus layer is an extra complexity that slows things down, possibly significantly.
@egbit I've added official support for profiling operations in https://github.com/ultralytics/yolov5/pull/1673, perhaps you can use this to test out design alternatives. If I run my previous comparison again using the new function I do see identical parameter counts and flops, with varying speedups during training, and also varying speedups (FP32) or a slight slowdown (FP16) in Focus inference. You can reproduce these in a Colab notebook (Profile example has been added to Appendix of tutorial.ipynb) though the results may vary by GPU and other factors:
Input:
# Profile
from utils.torch_utils import profile
from models.common import Conv, Focus
m1 = Conv(3, 64, k=6, s=2, p=2)
m2 = Focus(3, 64, k=3)
x = torch.randn(16, 3, 640, 640)
profile(x=x.float(), ops=[m1.float(), m2.float()], n=100) # FP32
profile(x=x.half(), ops=[m1.half(), m2.half()], n=100) # FP16
Output:
1.7.0+cu101 cuda _CudaDeviceProperties(name='Tesla T4', major=7, minor=5, total_memory=15079MB, multi_processor_count=40)
Params FLOPS forward (ms) backward (ms) input output
7040 23.07 62.89 87.79 (16, 3, 640, 640) (16, 64, 320, 320)
7040 23.07 15.52 48.69 (16, 3, 640, 640) (16, 64, 320, 320)
1.7.0+cu101 cuda _CudaDeviceProperties(name='Tesla T4', major=7, minor=5, total_memory=15079MB, multi_processor_count=40)
Params FLOPS forward (ms) backward (ms) input output
7040 23.07 11.61 79.72 (16, 3, 640, 640) (16, 64, 320, 320)
7040 23.07 12.54 42.94 (16, 3, 640, 640) (16, 64, 320, 320)
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
❔Question
Dear author, I have been watching your yolov5. I have no problems when running yolov5. But recently I want to remove all the functions of the Focus module in yolov5, so that the entire yolov5 does not use the Focus module, that is to do an ablation experiment on yolov5. I found that no matter how I debug or change it, I can’t achieve this. idea. So I ask the author for guidance, how should I modify the code to be able to achieve this ablation experiment? Looking forward to your reply! Thank you
Additional context