Closed LukeAI closed 4 years ago
@clw5180 I'm not sure what the cause of the discrepancy is. It could be differences in the group convolutions as @WongKinYiu mentioned. Note that yolov3-spp.cfg trains to much higher mAP with this repo than with darknet, so actual technical problems are very unlikely. See https://github.com/ultralytics/yolov3#map
@isgursoy @WongKinYiu @clw5180 @hwijune @Spectra456 the current status as far as I know is that there is a slight difference in implementing some operations in the csresnext50-panet-spp.cfg file in this repo compared to darknet, such that simply running the training command below fails:
python3 train.py --cfg csresnext50-panet-spp.cfg
The fix is essentially described here: https://github.com/ultralytics/yolov3/issues/698#issuecomment-570441779, I just need to implement and push it. I'll try to get this done in the next couple days, and then the next step would be to verify the cfg functionality by comparing mAP here using test.py.
Once that's done we can try to train from scratch and perhaps look at balancing the 3 losses or evolving the hyperparameters for this particular cfg. But yes it's a bit frustrating and a mystery why the cfg trains so much higher on darknet at the moment.
Darknet uses grouped-convolutional in the same way as nVidia cuDNN library, so it should be the same as in Pytorch.
hi @WongKinYiu
origin yolov3 mask order [yolo] 6,7,8 [yolo] 3,4,5 [yolo] 0,1,2 cspnet mask order [yolo] 0,1,2 [yolo] 3,4,5 [yolo] 6,7,8
Is there any difference?
No, there is no different. It because the order of pyramid scales of FPN and PANet are different.
No, there is no different. It because the order of pyramid scales of FPN and PANet are different.
can't change the order, right?
[yolo] 0,1,2 [yolo] 3,4,5 [yolo] 6,7,8 >>>>> [yolo] 6,7,8 [yolo] 3,4,5 [yolo] 0,1,2
Yes, because the anchor size should match the grid size.
@WongKinYiu I see in the https://github.com/ultralytics/yolov3/issues/698#issuecomment-585209887 image YOLOv3 corresponds to the FPN architecture (with 4 output layers), with the last output for the smallest objects. There are basically two steps: downsample, then upsample (with crosslinks).
In the PANet example, are there 3 steps? downsample, upsample, downsample (with crosslinks from step 2 to 3)? Does this improve the mAP typically at the expense of more weights/computation?
@glenn-jocher Hello,
typically yes.
But there are many different methods can be used to avoid that, for example, BiFPN.
@WongKinYiu ah very interesting! Figure 2 shows a good summary of the differences. Have you tried to create a *.cfg for efficientnet, or for a BiFPN type network? The results on COCO seem to show substantial improvement over what we are doing.
@glenn-jocher Hello,
I do not build such cfg file, but someone does. https://github.com/AlexeyAB/darknet/issues/4662
@WongKinYiu I see. Have you tried the 'Simplified PANet' that they show with CSPResNeXt50-PANet-SPP?
I did a brief search online for EfficientDet implementations but I could not find any good ones. The paper does not supply code, and 3rd party implementations don't show very good or reliable mAPs.
Would you be interested in trying to implement a BiFPN network?
@WongKinYiu ah I had another question. Why are the group convolutions necesary in CSPResNeXt50-PANet-SPP?
Have you tried using the basic Conv2d() instead, and were you able to determine performance improvements when moving from the basic convolutions to the group convolutions?
CSPResNeXt50 has too much filters (outputs), so without groups it will take a very large amount of memory, so you should decrease mini_batch size significantly. So better to use groups=4...16 https://github.com/WongKinYiu/CrossStagePartialNetworks/issues/6#issuecomment-584406057
If there is no group convolution, it is a CSPResNet50-PANet-SPP.
@AlexeyAB @glenn-jocher
Hello, I think BiFPN which implemented by darknet is good enough. csdarknet53-panet-spp-bifpn.txt
model | size | ap | ap50 | ap75 |
---|---|---|---|---|
CSPDarknet53-BiFPN | 512x512 | 38.4 | 62.3 | 41.3 |
@WongKinYiu Hi,
But even BiFPN-optimal worse than PANet-not-optimal, while optimal should give ~+4.4% extra AP: https://github.com/WongKinYiu/CrossStagePartialNetworks#gpu-real-time-models
model | size | ap | ap50 | ap75 |
---|---|---|---|---|
CSPDarknet53-BiFPN (optimal) | 512x512 | 38.4 | 62.3 | 41.3 |
CSPDarknet53-PANet-SPP (not optimal) | 512x512 | 38.7 | 61.3 | 41.7 |
@AlexeyAB Hello,
The anchor size of CSPDarknet53-BiFPN is not optimized due to my GPU RAM is insufficient to train with same setting as CSPResNeXt50-PANet-SPP (optimal).
@WongKinYiu
What do you mean? Memory consumption doesn't depend on achor size.
Do you mean that you trained?
CSPResNeXt50-PANet-SPP (optimal) - with width=512 height=512 subdivisions=8
mosaic=1 learning_rate=0.00261
CSPDarknet53-BiFPN (optimal) - with width=416 height=416 subdivisions=16
mosaic=1 learning_rate=0.001
Or did you train CSPResNeXt50-PANet-SPP (optimal) - with width=416 height=416
?
the anchor size of CSPResNeXt50-PANet-SPP is designed for 416x416.
(trained with width=416 height=416
)
the anchor size of CSPResNeXt50-PANet-SPP (optimal) is optimized for 512x512.
(trained with width=512 height=512
)
https://github.com/ultralytics/yolov3/issues/698#issuecomment-586271292.
(trained with width=416 height=416
due to memory is not enough trained with width=512 height=512
)
@WongKinYiu Thanks! So you trained CSPDarknet53 with lower network resolution than CSPResNext50.
But there are compared two CSPDarknet53 models, not CSPResNext50:
model | size | ap | ap50 | ap75 |
---|---|---|---|---|
CSPDarknet53-BiFPN (optimal) | 512x512 | 38.4 | 62.3 | 41.3 |
CSPDarknet53-PANet-SPP (not optimal) | 512x512 | 38.7 | 61.3 | 41.7 |
Are both these models trained with width=416 height=416 subdivisions=16
?
Or as I see:
CSPDarknet53-BiFPN (optimal) - width=416 height=416 subdivisions=16 mosaic=1
https://github.com/ultralytics/yolov3/files/4204443/csdarknet53-panet-spp-bifpn.txt
CSPDarknet53-PANet-SPP (not optimal) - width=416 height=416 subdivisions=4 mosaic=0
https://github.com/WongKinYiu/CrossStagePartialNetworks/blob/master/cfg/csdarknet53-panet-spp.cfg
both of these two models are trained with width=416 height=416
.
the setting of CSPDarknet53-BiFPN (optimal) is as you see.
i am not sure about the subdivision
of CSPDarknet53-PANet-SPP (not optimal), but yes mosaic=0
.
in https://github.com/WongKinYiu/CrossStagePartialNetworks#gpu-real-time-models CSPDarknet53-PANet-SPP (not optimal) and CSPResNet50-PANet-SPP (not optimal) are not trained by myself.
@WongKinYiu
both of these two models are trained with width=416 height=416.
So from this table we can't say what is better BiFPN vs PAN?
model | size | ap | ap50 | ap75 |
---|---|---|---|---|
CSPDarknet53 BiFPN (optimal) trained 416x416 subdivisions=16 | 512x512 | 38.4 | 62.3 | 41.3 |
CSPDarknet53 PANet-SPP (not optimal) trained 416x416 subdivisions=4 or 8 or 16 | 512x512 | 38.7 | 61.3 | 41.7 |
So we can say that BiFPN at least works.
What is the current avg-loss of ASFF, can we say that it at least works?
currently 245k epoch, 10.5 loss.
@glenn-jocher @AlexeyAB update
Model | Size | AP | AP50 | AP75 |
---|---|---|---|---|
CSPDarknet53 BiFPN (optimal) trained 416x416 subdivisions=16 | 512x512 | 38.4 | 62.3 | 41.3 |
CSPDarknet53 PANet-SPP (optimal) trained 416x416 subdivisions=16 | 512x512 | 41.6 | 64.1 | 45.0 |
@WongKinYiu @glenn-jocher So previous version of BiFPN is bad. Try to use new BiFPN version: https://github.com/AlexeyAB/darknet/issues/4662#issuecomment-587490873
@glenn-jocher @AlexeyAB update
Model Size AP AP50 AP75 CSPDarknet53 BiFPN (optimal) trained 416x416 subdivisions=16 512x512 38.4 62.3 41.3 CSPDarknet53 PANet-SPP (optimal) trained 416x416 subdivisions=16 512x512 41.6 64.1 45.0
@WongKinYiu wow great! What's the difference between the not-optimal and optimal versions of CSPDarknet53 PANet-SPP? The optimal version shows +3 mAP improvement, what differences did you make to get this?
not-optimal: all hyper-parameters are same as default yolov3. optimal: with ciou and your genetic algorithm, mosaic augmentation, scale sensitivity, iou threshold. (see [net] and [yolo] in cfg file https://github.com/ultralytics/yolov3/issues/698#issuecomment-586271292)
@glenn-jocher @WongKinYiu
Why CSPDarknet53s-PANet-SPP Ultralitics
has lower AP than CSPDarknet53 PANet-SPP Darknet
?
Model | Size | AP | AP50 | AP75 | URL | cfg |
---|---|---|---|---|---|---|
YOLOv3-SPP (baseline) Ultralitics (optimal) trained 416x416 -batch=16 | 512x512 | 39.7 | 60.5 | 42.2 | url | cfg |
CSPDarknet53s-PANet-SPP Ultralitics (optimal) trained 416x416 -batch=16 | 512x512 | 40.0 | 60.4 | 42.9 | url | cfg |
CSPDarknet53 PANet-SPP Darknet (optimal) trained 416x416 subdivisions=16 | 512x512 | 41.6 | 64.1 | 45.0 | url | cfg |
Both use:
The difference is only -
What am I missing?
@AlexeyAB I don't know, this is a very good question. The gap is very large in mAP. I think what I should do is try to test mAP with CSPDarknet53 PANet-SPP Darknet first, to establish that the cfg loads the model correctly. I'll do that today.
Yes it is true I don't use any pretrained weights (I saw slightly worse results with darknet53.conv.74). I tried CIoU loss and did not see any added benefit compared to GIoU.
I used the linked urls and weights, and tested at 512 on my own with the following commands. Results are slightly higher than the earlier table. I was not able to test the last one, as there were new cfg entries it did not recognize. I will comment these and try again.
git clone https://github.com/ultralytics/yolov3
cd yolov3
python3 test.py --img 512 --weights ... --cfg ...
Model | Size | AP | AP50 | AP75 | URL | cfg |
---|---|---|---|---|---|---|
YOLOv3-SPP (baseline) Ultralytics (optimal) trained 416x416 -batch=16 | 512x512 | 40.2 | 61.3 | - | url | cfg |
CSPDarknet53s-PANet-SPP Ultralitics (optimal) trained 416x416 -batch=16 | 512x512 | 40.7 | 60.7 | - | url | cfg |
CSPDarknet53 PANet-SPP Darknet (optimal) trained 416x416 subdivisions=16 | 512x512 | - | - | - | url | cfg |
i am in a business trip, will provide some training info of YOLOv3-SPP (baseline) Ultralitics and CSPDarknet53s-PANet-SPP Ultralitics after back to office.
@WongKinYiu ok great! I got the last darknet model to run, but mAPs came back as 0.0. Note that I modified my default test nms --iou-thres
from 0.5 to 0.6, as this produces a better balance of mAP@0.5:0.95 (best at --iou-thres 0.7
) and mAP@0.5 (best at --iou-thres 0.5
).
Also note the latest yolov3-spp.cfg baseline trains to 41.9/61.8 at 608 with the default settings. The training commands to reproduce this are here. The two seperate --img-size are train img-size and test img-size. Multi-scale train img sizes using this command will be 288 - 640.
python3 train.py --data coco2014.data --img-size 416 608 --epochs 273 --batch 16 --accum 4 --weights '' --device 0 --cfg yolov3-spp.cfg --multi
@glenn-jocher
Note that I modified my default test nms
--iou-thres
from 0.5 to 0.6, as this produces a better balance of mAP@0.5:0.95 (best at--iou-thres 0.7
) and mAP@0.5 (best at--iou-thres 0.5
).
Yes, I know. However, for the competition, we should use same IoU threshold for both mAP@0.5:0.95 and mAP@0.5.
Also note the latest yolov3-spp.cfg baseline trains to 41.9/61.8 with the default settings. The training commands to reproduce this are here. The two seperate --img-size are train img-size and test img-size. Multi-scale train img sizes using this command will be 288 - 640.
Thanks, I just use the default setting of the repo which I used to train the model. As I remember, that repo gets about 40.9 mAP@0.5:0.95 on your report. By the way, all of my results are obtained by test-dev set and your results are obtained by min-val set.
@WongKinYiu ah test-dev set could be a difference too then!
Well it seems some differences remain as the ultralytics repo can't load the best performing darknet CSPDarknet53s-PANet-SPP model then. These differences must be the source of the problem I think.
@glenn-jocher
Also note the latest yolov3-spp.cfg baseline trains to 41.9/61.8 at 608 with the default settings.
What is the difference between your training and this yolov3-spp.cfg https://github.com/WongKinYiu/CrossStagePartialNetworks/tree/pytorch#ms-coco ? Why such difference?
@AlexeyAB
I use this repo to train: https://github.com/ultralytics/yolov3/tree/a6f87a28e7595e71752583fb41340f9d1105d75f There are many improvements in these days on ultralytics.
@WongKinYiu @glenn-jocher So, I want to know what improvements have been made?
Hmmm well lots of small day to day changes. If I use the github /compare it doesn't show the date of that commit, but it shows that there are 400 commits since then, with many modifications: https://github.com/ultralytics/yolov3/compare/a6f87a28e7595e71752583fb41340f9d1105d75f...master#diff-04c6e90faac2675aa89e2176d2eec7d8
The README from then was showing 40.0/60.9 mAP, which is similar to what @WongKinYiu was seeing, vs today's README which shows 41.9/61.8.
The improvements are over many different parts, such as the NMS, which now uses multi-label, the augmentation, which has been set to zero, the loss function reduction, which I returned to mean() instead of sum(), the cosine scheduler implementation, the increase in the LR to 0.01 after cos was implemented, and maybe a few other tiny things. The architecture itself is the same (yolov3-spp.cfg).
Actually this is an important point. A lot of papers today are showing very outdated comparisons to YOLOv3, i.e. showing 33 mAP@0.5:0.95 like the EfficientDet paper, with a GPU latency of 51ms. The reality is the most recent YOLOv3-SPP model I trained is at 42.1 mAP@0.5:0.95, with a GPU latency of 12.8ms https://github.com/ultralytics/yolov3/issues/679#issuecomment-597219021, which puts it far better than their own D0-D2 models in both speed and mAP. I'm not sure how best to get that message out.
@glenn-jocher So the main difference:
new_loss = sum_for_i( loss_obj, loss_cls, loss_bbox) / count
?@AlexeyAB
Yes NMS uses multi-label now, which bumped up mAP about +0.3. Yes spatial augmentation seemed to hurt training, so I set it to zero, but left HSV augmentation on:
'hsv_h': 0.0138, # image HSV-Hue augmentation (fraction)
'hsv_s': 0.678, # image HSV-Saturation augmentation (fraction)
'hsv_v': 0.36, # image HSV-Value augmentation (fraction)
'degrees': 1.98 * 0, # image rotation (+/- deg)
'translate': 0.05 * 0, # image translation (+/- fraction)
'scale': 0.05 * 0, # image scale (+/- gain)
'shear': 0.641 * 0} # image shear (+/- deg)
loss_giou = (giou_1.mean() + giou_2.mean() + giou_3.mean()).sum()
I'm really hoping we might be able to merge the YOLO outputs some day so I can do away with this uncertainty in how to combine the losses from the different layers. ASFF seems to be an interesting step in that direction.
@AlexeyAB ah also another change I forgot to mention was I changed multi-scale to change the resolution every batch now, instead of every 10 batches before. This seemed to smooth the results a bit, epoch to epoch.
@WongKinYiu yes they look super similar to each other unfortunately. I'm not sure why we aren't seeing the same gains as the darknet training. It must have to do with the grouped convolutions I think.
@glenn-jocher
Yes NMS uses multi-label now, which bumped up mAP about +0.3.
Does it currently work in such a way?
if there are 2 bboxes with IoU > iou_nms
Then it will remove class1_prob = 0.5 and class2_prob = 0.5, and will leave:
The loss is back to it's original form, using the PyTorch defaults, which is for example for the 3 yolo layers: loss_giou = (giou_1.mean() + giou_2.mean() + giou_3.mean()).sum()
Do you know how this changes the Delta
during auto-differentiation in Pytorch?
Do you apply it only for x,y,w,h
and not for probs
and obj
?
Yes spatial augmentation seemed to hurt training, so I set it to zero, but left HSV augmentation on:
Yes, it may help to win compete, but may be it may hurt cross-domain accuracy when testing images/videos are not similar to MS COCO.
It seems it works well because Ultralitics uses letter_box-image-resizing by default, so it keeps aspect ratio and doesn't require large spatial image transformation.
In the Darknet we can try to use jitter=0.1 letter_box=1
instead of jitter=0.3 letter_box=0
I think the higher network resolution - the more preferably to use jitter=0.1 letter_box=1
I'm really hoping we might be able to merge the YOLO outputs some day so I can do away with this uncertainty in how to combine the losses from the different layers.
What do you mean?
I changed multi-scale to change the resolution every batch now, instead of every 10 batches before. This seemed to smooth the results a bit, epoch to epoch.
Does it decrease training speed, because changing of network size requires time?
If we use dynamic_minibatch=1 in the Darknet, when we change width,height,mini_batch
dynamically and should reallocate GPU-arrayes for each layer, it can decrease treaining speed 2x-3x times if we will use it after each iteration.
@WongKinYiu
Have you checked if scale_x_y=1.1
increases AP95 accuracy, while it decreases AP50 and AP75 but keeps the same AP50...95? https://github.com/WongKinYiu/CrossStagePartialNetworks/blob/master/coco/results.md#mscoco
EfficientNetB0-Yolo was added to the OpenCV-dnn module
So it only requires to implement scale_x_y=1.1
for using csresnext50-panet-spp-original-optimal.cfg
with OpenCV-dnn.
i have only done experiments for scale_x_y=1.05
, scale_x_y=1.1
, and scale_x_y=1.2
of different feature pyramids.
have u tested the inference speed of enetb0-yolo using opencv-dnn?
have u tested the inference speed of enetb0-yolo using opencv-dnn?
Not yet. I will test it on Intel CPU and Intel Myraid X neurochip
@AlexeyAB @WongKinYiu I made a simple Colab notebook to see the time effects of group/mix convolutions.
It times a tensor passing forward and backward (to mimic training) through a Conv2d() op. The speeds stay about the same even as the parameter count drops by >10X. So similar sized models using these ops may be much slower.
b=m(x), x=[16, 128, 38, 38], b=[16, 256, 38, 38]
groups time(ms) params shape m
1 5.1 294912 [256, 128, 3, 3]
2 4.2 147456 [256, 64, 3, 3]
4 4.2 73728 [256, 32, 3, 3]
8 4.9 36864 [256, 16, 3, 3]
16 6.9 18432 [256, 8, 3, 3]
32 6.1 9216 [256, 4, 3, 3]
64 2.6 4608 [256, 2, 3, 3]
128 2.0 2304 [256, 1, 3, 3]
@glenn-jocher Yes, nVidia cuDNN work in the same way. Also Google Coral TPU-Edge neurochip doesn't use Grouped-conv, despite the fact that they advertise the EffecientDet/Net with grouped convolutions. https://ai.googleblog.com/2019/08/efficientnet-edgetpu-creating.html
Does this repo. support CSPResNeXt50-PANet-SPP? (https://github.com/WongKinYiu/CrossStagePartialNetworks/)
AlexeyABs support: https://github.com/AlexeyAB/darknet/issues/4406
My tests have found it to be a clear winner over yolov3-spp in terms of mAP and speed.