pjreddie / darknet

Convolutional Neural Networks
http://pjreddie.com/darknet/
Other
25.87k stars 21.33k forks source link

Yolo V3 performance much lower than Yolo Tiny V3 #901

Open ssanzr opened 6 years ago

ssanzr commented 6 years ago

Hi Everyone, I have been training different YOLO networks for my custom dataset following the repository from @AlexeyAB and i am quite puzzled about the performance obtained for each network.

I am using exactly the same testing and training .png dataset for every network. I have 700 training images and 300 for testing.

Performance summary for the different networks:

Neural Network Input Resolution Iterations Avg loss average IoU(%) mAP (%)
Tiny Yolo V3 416x416 42000 0.1983 45.59% 61.18%
Tiny Yolo V3 608x608 21700 0.3469 46.39% 61.29%
Tiny Yolo V3 832x832 55200 0.2311 48.69% 56.77%
Yolo V3 416x416 19800 0.1945 0.00% 0.00%
Yolo V3 608x608 2900 0.71 42.63% 23.46%
Yolo V3 832x832 5600 0.3324 38.77% 41.20%

Anyone any idea?

Thanks

AlexeyAB commented 6 years ago

@ssanzr Yolo v3 should have much more accuracy than Yolo v3 tiny. And high resolution should increase accuracy.

  1. What params did you use in the Makefile?
  2. Did you check your dataset by Yolo_mark?
  3. Can you show content of file bad.list and bad_label.list if it will be created after training?
  4. Did you use detector map command to get mAP?
  5. What command did you use to get anchors?
  6. Show your anchors.
  7. Why did you train only 2900 iterations for Yolo V3 608x608 ?
  8. Attach your yolo v3 cfg-file.
ssanzr commented 6 years ago

1.What params did you use in the Makefile? I used MSVS2017. CUDA and CUDNN enabled. Do you mean other parameters?

2.Did you check your dataset by Yolo_mark? I labeled it with Yolo_Mark

3.Can you show content of file bad.list and badlabel.list if it will be created after training? files bad.list: "C:\darknet-master\data\dog.jpg" -c0 "C:\darknet-master\data\giraffe.jpg" C:\VMData\train.txt C:\VMData\train.txt C:\VMData\train.txt

bad_label.list not found. How can i enable its creation?

  1. Did you use detector map command to get mAP? Yes

  2. What command did you use to get anchors? I did not modify anything from the default .cfg file

  3. Show your anchors. [yolo] mask = 6,7,8 anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326 [yolo] mask = 3,4,5 anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326 [yolo] mask = 0,1,2 anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326

  4. Why did you train only 2900 iterations for Yolo V3 608x608 ? Avg loss was already quite ok. I will train it further

  5. Attach your yolo v3 cfg-file. yolov3-S.txt

AlexeyAB commented 6 years ago

@ssanzr

bad_label.list not found. How can i enable its creation?

bad_label.list file will be created automatically only if your dataset is wrong. So your dataset is correct.

Also Yolo v3 requires more iterations to get high accuracy (mAP), so in general you should train it more iterations than Yolo v3 tiny.

ssanzr commented 6 years ago

@AlexeyAB thanks a lot for your support here.

With Yolo Tiny v3 i am able to get very good results with 416x416 Images, so i will try this resolution for YoloV3; getting 50000 iterations with large images will take a couple of weeks on my system. I will change the anchors and the steps as you suggested. Let me know if this does not sound ok.

Anyways i am very curious to understand why the Yolo Tiny works quite good "out of the box" and not YoloV3 for the same input. Lets see if it makes sense after this round of testing

Saving anchors to the file: anchors.txt anchors = 15.6920,17.3646, 16.6959,23.1417, 21.8907,19.4287, 17.8471,29.0398, 22.1554,25.0664, 17.6984,37.4543, 29.0695,23.4917, 24.8681,30.5106, 31.4007,36.9213

image

Neural Network Input Resolution Iterations Avg loss average IoU(%) mAP (%)
Tiny Yolo V3 416x416 42000 0.1983 45.59% 61.18%
Tiny Yolo V3 608x608 21700 0.3469 46.39% 61.29%
Tiny Yolo V3 832x832 55200 0.2311 48.69% 56.77%
Yolo V3 416x416 19800 0.1945 0.00% 0.00%
Yolo V3 608x608 2900 0.71 42.63% 23.46%
**Yolo V3 608x608 5400 0.71 39.02% 52.88%**
Yolo V3 832x832 5600 0.3324 38.77% 41.20%
AlexeyAB commented 6 years ago

@ssanzr

**Yolo V3 608x608 5400 0.71 39.02% 52.88%**

Anchors: num_of_clusters = 9, width = 416, height = 416 ... anchors = 15.6920,17.3646, 16.6959,23.1417, 21.8907,19.4287, 17.8471,29.0398, 22.1554,25.0664, 17.6984,37.4543, 29.0695,23.4917, 24.8681,30.5106, 31.4007,36.9213

For your dataset the Yolo v3 with high resolution should give much higher accuracy.

Try to calculate anchors for 832x832: darknet.exe detector calc_anchors data/obj.data -num_of_clusters 9 -width 832 -height 832 -show

And Train Yolo v3 with width=832 height=832 and random=1 at least 15 000 - 20 000 iterations - you will get much higher accuracy.

ssanzr commented 6 years ago

@AlexeyAB

Did you train with width=608 height=608, or did you train with 416x416 but used 608x608 only for detection?

I trained with 608x608 and i detected with 608x608

Did you use random=1 for training?

Yes

Did you use anchors that is calculated for 416x416 for training on 608x608?

This trial was done with the default anchors. I will try now with the calculated anchors:

num_of_clusters = 9, width = 608, height = 608 read labels from 646 images loaded image: 646 box: 1232 all loaded.

calculating k-means++ ... avg IoU = 85.78 %

Saving anchors to the file: anchors.txt anchors = 22.9345,25.3790, 24.4016,33.8224, 31.9940,28.3958, 26.0843,42.4427, 32.3810,36.6355, 25.8669,54.7409, 42.4861,34.3341, 36.3457,44.5924, 45.8934,53.9618

194 MB VOC-model - save result to the file res.avi: darknet.exe detector demo data/voc.data yolo-voc.cfg yolo-voc.weights test.mp4 -i 0 -out_filename res.avi

AlexeyAB commented 6 years ago

@ssanzr

Yolo V3 416x416 19800 0.1945 0.00% 0.00%

It seems that something went wrong.

Why does the Yolo Tiny work quite good "out of the box" for 416x416 and not YoloV3 for the same input and resolution? Any hypothesis?

You should train Yolo v3 much more iterations.

Using the command below i am able to create the results .avi file with tiny Yolo but for some reason, in each (n) frame, i the bounding boxes for the (n-1) frame are plotted. Do you know how to fix this?

For video Yolo averages detections for (n-2), (n-1), n frames, and shows these boxes on (n-1) frame. Just set #define FRAMES 3 to disable it: https://github.com/AlexeyAB/darknet/blob/99c92f98e08c007b23b21d2e0887a59f14045efb/src/demo.c#L18

ssanzr commented 6 years ago

@AlexeyAB Your support is really great, and really appreciated!!

Do you use CUDNN_HALF=1 in the Makefile?

Using the MSVS 2017. Where can i change this?

What GPU do you use?

NVIDIA GeForce GTX 1060 with Max-Q Design

What is the date of your code from this repository? https://github.com/AlexeyAB/darknet

June 12

For video Yolo averages detections for (n-2), (n-1), n frames, and shows these boxes on (n-1) frame. Just set #define FRAMES 3 to disable it: https://github.com/AlexeyAB/darknet/blob/99c92f98e08c007b23b21d2e0887a59f14045efb/src/demo.c#L18

Sorry, i am quite beginner. Do you mean replacing "#define FRAMES 3" by "#define FRAMES 1"?.

I believe the video is not averaging the position in n-1, i have just checked it and it really seems that frame n is showing the bounding x box label for n+1. I think i explained wrongly before.

AlexeyAB commented 6 years ago

@ssanzr

Do you mean replacing "#define FRAMES 3" by "#define FRAMES 1"?

Yes.

Do you use CUDNN_HALF=1 in the Makefile?

Using the MSVS 2017. Where can i change this?

What GPU do you use?

NVIDIA GeForce GTX 1060 with Max-Q Design

This is normal. For your GPU you shouldn't use CUDNN_HALF.

ssanzr commented 6 years ago

Hi again @AlexeyAB

It seems that setting new anchors calculated for -width 608 -height 608 an setting steps steps=40000,45000 does make the performance worse.

Neural Network Input Resolution Weights Iterations Avg loss average IoU(%) mAP (%)
Yolo V3 608x608 yolov3-VM_608.cfg 2900 0.71 42.63% 23.46%
Yolo V3 608x608 yolov3-VM_608-steps-anchors.cfg 7000 0.5074 9.24% 3.69%

I will keep training and i will let you know the progress

ssanzr commented 6 years ago

@AlexeyAB

I continue training, but the results does not see to be improving. The avg loss is slightly decreasing, but the avg IOU and mAP are not improving, or even getting worst.

Any other idea that might help here?

Neural Network Input Resolution Weights Iterations Avg loss average IoU(%) mAP (%)
Yolo V3 608x608 yolov3-VM_608-steps-anchors.cfg 7000 0.5074 9.24% 3.69%
Yolo V3 608x608 yolov3-VM_608-steps-anchors.cfg 8700 0.4312 55.05% 14.26%
Yolo V3 608x608 yolov3-VM_608-steps-anchors.cfg 10200 0.3289 43.50% 52.41%
Yolo V3 608x608 yolov3-VM_608-steps-anchors.cfg 11900 0.3306 46.67% 30.93%
Yolo V3 608x608 yolov3-VM_608-steps-anchors.cfg 16000 0.1763 33.80% 11.12%
AlexeyAB commented 6 years ago

@ssanzr This is very strange.

If the batch was the same, but subdivisions was smaller for 416x416, then minibatch=batch/subdivisions will be larger and it can better train the network 416x416. So it might be the reason that 608x608 has lower mAP than 416x416. But for your small objects - the higher resolution should give more advantages than higher minibatch.

I have 700 training images and 300 for testing.

num_of_clusters = 9, width = 608, height = 608 read labels from 646 images loaded image: 646 box: 1232 all loaded.

calculating k-means++ ... avg IoU = 85.78 %

Saving anchors to the file: anchors.txt anchors = 22.9345,25.3790, 24.4016,33.8224, 31.9940,28.3958, 26.0843,42.4427, 32.3810,36.6355, 25.8669,54.7409, 42.4861,34.3341, 36.3457,44.5924, 45.8934,53.9618

If anchors in Test dataset are very different from anchors from Training dataset - it might be the reason that 608x608 has lower mAP than 416x416.

ssanzr commented 6 years ago

@AlexeyAB

What batch= and subdivision= did you use for training 416x416 and 608x608?

416x416: batch=64 subdivisions=32 608x608 batch=64 subdivisions=64

This values are chosen based on the maximum i achieve without CUDA error

But for your small objects - the higher resolution should give more advantages than higher minibatch.

It makes sense to me. In tiny Yolo there is no big difference between different resolutions

Neural Network Input Resolution Weights Iterations Avg loss average IoU(%) mAP (%)
Tiny Yolo V3 416x416 yolov3-tiny-VM.cfg 42000 0.1983 45.59% 61.18%
Tiny Yolo V3 608x608 yolov3-tiny-VM_608.cfg 21700 0.3469 46.39% 61.29%
Tiny Yolo V3 832x832 yolov3-tiny-VM_832.cfg 55200 0.2311 48.69% 56.77%

Did you train on training dataset(700 images) and did you calculate mAP on testing dataset (another 300 images)? It is recommended to use at least 2000 images per class, so perhaps you met overfitting, therefore you see mAP decreasing.

Yes

num_of_clusters = 9, width = 608, height = 608 read labels from 646 images loaded image: 646 box: 1232 all loaded.

calculating k-means++ ... avg IoU = 85.78 %

Saving anchors to the file: anchors.txt anchors = 22.9345,25.3790, 24.4016,33.8224, 31.9940,28.3958, 26.0843,42.4427, 32.3810,36.6355, 25.8669,54.7409, 42.4861,34.3341, 36.3457,44.5924, 45.8934,53.9618

num_of_clusters = 9, width = 608, height = 608 read labels from 301 images loaded image: 289 box: 557 all loaded.

calculating k-means++ ... avg IoU = 86.85 %

Saving anchors to the file: anchors.txt anchors = 4.2752,3.3777, 22.6959,33.0490, 23.9492,41.1689, 27.6245,48.9778, 34.1149,40.1174, 26.7869,57.7545, 35.4523,50.2061, 35.8496,58.8041, 32.4750,70.3333

image

AlexeyAB commented 6 years ago

@ssanzr

416x416: batch=64 subdivisions=32 608x608 batch=64 subdivisions=64

ZHANGKEON commented 6 years ago

@AlexeyAB @ssanzr I have also encountered the same problem using custom dataset (raccoon dataset from experiencor/keras-yolo3). Use tiny yolov3 has a much higher accuracy than yolov3.

Actually, I found that yolov3 is very sensitive to the anchors from dimension clustering. When using 9 anchors (yolov3) instead 6 anchors (tiny yolov3), some problems caused. Maybe this is the reason for your low accuracy.

Also, I think dimension clustering has some problem on small dataset. The generated anchors are usually very close to each other and this lead to low accuracy.

ssanzr commented 6 years ago

That is a good point, @ZHANGKEON @. I will try YoloV3 with the 6 anchors from Yolo tiny V3 and see what happens.

ZHANGKEON commented 6 years ago

@ssanzr Maybe you can just try with original yolov3 anchors on your dataset to see how the accuracy changes.

ghost commented 5 years ago

Hi @ssanzr As you have said that you're using grayscale images.

  • I am using grayscale images, so i disabled color data augmentation.
  • I am training with .png images, do you see any issue with this.

I'm trying to run Yolov3 on grayscale by changing color channel=1 in yolov3.cfg file But I'm getting segmentation error. I also tried the solution by reducing subdivision and random=0. I also tried by changing data.c detection_load function's line as hw; instead of hw. I tried different methods to solve this problem but still I'm unable to perform detection on grayscale.

Would you please guide me that what should I do now to use grayscale images with Yolov3?

Thank You

ssanzr commented 5 years ago

@samgithub11

I have never seen any segmentation error with my grayscale images. Actually. My issue was that for Yolo Tiny V3 worked, and for Yolo V3, it did not.

Sorry for being of big help...

BarryLYH commented 5 years ago

A quick reason is overfitting. The dataset is too small and yolov3 is deep which yolov3_tiny is small. This can cause this problem. When you use the model in the real environment, the well-trained yolov3 has a better performance.

Mahibro commented 5 years ago

@AlexeyAB Hey, I'm using repository from (https://github.com/AlexeyAB/darknet) and am trying to train for my own dataset with 200 Images.

Here is my Question 1) How to increase my training speed?in makefile enabled GPU=1,cuDNN=1 2) How to use more GPU in GCP?available gpu 15gb.

AlexeyAB commented 5 years ago

@Mahibro Hi,

  1. Set lower subdivisions= in cfg-file and use modern GPU (Volta/Turing)
  2. If you use several GPUs, for example 4 GPUs, then you can train with flag -gpus 0,1,2,3 https://github.com/AlexeyAB/darknet#how-to-train-with-multi-gpu
Mahibro commented 5 years ago

@AlexeyAB If i set Batch & Subdivision=1(Training will not progress,it will ask to set batch=64).Should i make any changes to use GPU? (please elaborate me i dont know much).

AlexeyAB commented 5 years ago

@Mahibro You must set batch=64.

Read this: https://github.com/AlexeyAB/darknet#how-to-train-to-detect-your-custom-objects

aaobscure commented 5 years ago

@AlexeyAB Hi, thanks for your help.

I have a small dataset, about 1000 images, I am training my own dataset on Yolov3. In training, the avg loss is decreasing but when I check it with a new dataset, the result of weights of 1000th iteration is better than 4000th one. Am I facing overfitting or should I run it for more iterations?

How many iterations is good for Yolov3, for a small dataset with 8 classes? Should I use tiny yolov3 instead?

Thanks again!

AlexeyAB commented 5 years ago

@aaobscure What mAP do you get for weights of 1000 and 4000 iterations?

aaobscure commented 5 years ago

you @AlexeyAB For 1000 I get 1.2 For 4000 I get 0.72

another question is that it is not decreasing lot after 3000 iterations, what should I do?

AlexeyAB commented 5 years ago

@aaobscure Do you get only 0.72% ? It is very low.

How many iterations is good for Yolov3, for a small dataset with 8 classes?

You should have ~16 000 images and you should train ~16 000 iterations with batch=64.

Should I use tiny yolov3 instead?

You can try

aaobscure commented 5 years ago

You should have ~16 000 images and you should train ~16 000 iterations with batch=64. I have just 1000 images. what should do?

another question is that how often I should change the learning rate? and how to do it?

fashlaa commented 5 years ago

@ssanzr @AlexeyAB hi, before I used yolov3, tiny yolov3 and tiny yolov3-xnor for my detection system, but the dataset I used was colored image data. but this time I want to try with gray scale image data. and what I want to ask is.

  1. How and what config do I need to change to train my gray scale image?

  2. How to run the gray scale image training results on a demo using a webcam by changing its appearance to gray scale?

I'm sorry for my bad English

AlexeyAB commented 5 years ago

@fashlaa Hi, Just set channels=1 in the [net] section in cfg-file.

fashlaa commented 5 years ago

@AlexeyAB Previously I used your old version of the Yolo Darknet repository. Does it support for your old version? and if when I run the webcam demo open directly with gray scale video streams?

YongLAGCC commented 5 years ago

Hey @AlexeyAB Please help, I used Tiny Yolov3, 6 anchors, 64 batch, 8 subdivision. 200 images in windows without GPU, I have no idea why Avg become -Nan after 30th or 40th iteration. I relabeled image and redownload repo, it still has same issue. Thank you in advance, Sir. Here is some details:

num_of_clusters = 6, width = 416, height = 416 read labels from 200 images loaded image: 200 box: 212 all loaded. calculating k-means++ ... iterations = 13 avg IoU = 78.85 % Saving anchors to the file: anchors.txt anchors = 101,167, 158,198, 149,307, 215,229, 327,254, 249,341

Eyshika commented 5 years ago

@AlexeyAB which repository should we use for tiny yolov3 ?

william91108 commented 5 years ago

I am training the custom data with yolov3.cfg with the commend darknet.exe detector train data/obj.data cfg/yolo-obj.cfg darknet53.conv.74 -mjpeg_port 8090 -map but the mAP is very low. is there any problem I need to fix? cfg: batch=64 subdivisions=64 width=416 height=416 71703056_2192894217670396_581051698680692736_n

intelltech commented 5 years ago

How do you calculate the anchor for Tiny YoloV3?, your help please.

joelmatt commented 5 years ago

@intelltech change the anchors since tiny yolo has only 6 anchors !../../.././darknet detector calc_anchors data/obj.data -num_of_clusters 6 -width 640-height 640

intelltech commented 5 years ago

Okay. Thank you. But in my YoloV3-Tiny.cfg it is configured as: 416x416 (which I also use to train) because 640x640?

intelltech commented 5 years ago

@joelmatt Okay. Thank you. But in my YoloV3-Tiny.cfg it is configured as: 416x416 (which I also use to train) because 640x640?

adrianosantospb commented 5 years ago

@AlexeyAB

What batch= and subdivision= did you use for training 416x416 and 608x608?

416x416: batch=64 subdivisions=32 608x608 batch=64 subdivisions=64

This values are chosen based on the maximum i achieve without CUDA error

But for your small objects - the higher resolution should give more advantages than higher minibatch.

It makes sense to me. In tiny Yolo there is no big difference between different resolutions

Neural Network Input Resolution Weights Iterations Avg loss average IoU(%) mAP (%) Tiny Yolo V3 416x416 yolov3-tiny-VM.cfg 42000 0.1983 45.59% 61.18% Tiny Yolo V3 608x608 yolov3-tiny-VM_608.cfg 21700 0.3469 46.39% 61.29% Tiny Yolo V3 832x832 yolov3-tiny-VM_832.cfg 55200 0.2311 48.69% 56.77%

Did you train on training dataset(700 images) and did you calculate mAP on testing dataset (another 300 images)? It is recommended to use at least 2000 images per class, so perhaps you met overfitting, therefore you see mAP decreasing.

Yes

num_of_clusters = 9, width = 608, height = 608 read labels from 646 images loaded image: 646 box: 1232 all loaded.

calculating k-means++ ... avg IoU = 85.78 %

Saving anchors to the file: anchors.txt anchors = 22.9345,25.3790, 24.4016,33.8224, 31.9940,28.3958, 26.0843,42.4427, 32.3810,36.6355, 25.8669,54.7409, 42.4861,34.3341, 36.3457,44.5924, 45.8934,53.9618

num_of_clusters = 9, width = 608, height = 608 read labels from 301 images loaded image: 289 box: 557 all loaded.

calculating k-means++ ... avg IoU = 86.85 %

Saving anchors to the file: anchors.txt anchors = 4.2752,3.3777, 22.6959,33.0490, 23.9492,41.1689, 27.6245,48.9778, 34.1149,40.1174, 26.7869,57.7545, 35.4523,50.2061, 35.8496,58.8041, 32.4750,70.3333

image

Hello, @ssanzr. What is the number of classes that you are using in this test? I'm asking you because I'm facing the similar question on my experiments, but in my case I'm just using one class. I have similar results to full YOLO and Tiny YOLO and, in some cases, Tiny YOLO has better results.

mike-briggs commented 4 years ago

Weights only save every 100 iterations until 900, then saves every 10,000. Read more here: https://github.com/pjreddie/darknet/issues/190

siatpat commented 4 years ago

@william91108

did you manage to solve your issue I am facing the same issue and I don't know how to go about solving this.

siatpat commented 4 years ago

Hello @AlexeyAB, I am facing the following issue that for my custom dataset the avg loss is going down but the mAP is still 0. I have ~ 12000 train images from with 7000 have defects I want to identify. I have 4 classes of defects I am training for, I have around 5000 images in the test set.

I am training on EC2 with 4 GPUs. I have first trained for 1000 iterations as you suggested on one GPU and now I am training for 8000 iterations on all 4 GPUs. The avg loss is going down but mAP is still 0, and I don't know what I should check.

darknet-mAP

All my images are 1600 x 256 and I have kept them that way. I have modified anchors to used calculated anchors based on my datasets anchors = 20, 254, 31, 254, 70, 253, 27,103, 25, 39, 16, 18, 56, 56, 155, 89, 434, 182

In cfg file I have Batch = 64 subdivisions = 16 and appart from doing the suggested modifications when training on own dataset, I have only played with max_batches.

I have check with -show_imgs that bounding boxes were showing properly

Do you have any suggestions on why the mAP is showing 0.

Regards