thtrieu / darkflow

Translate darknet to tensorflow. Load trained weights, retrain/fine-tune using tensorflow, export constant graph def to mobile devices
GNU General Public License v3.0
6.14k stars 2.08k forks source link

No ouput boxes after training !! #80

Open khorchefB opened 7 years ago

khorchefB commented 7 years ago

Hello,

Currently I am trying to train the yolo.cfg (version 2) with 2 labels. (I want to recognise Dark Vador and Yoda in my test images.) I changed the number of classes in yolo.cfg, and I renamed it yolo-5C.cfg. So I put 2 labels in labels.txt, I created the annotation files, and finally I started the training using CPU with this command:

./flow --model/yolo-5C.cfg --load bin/yolo.weights --dataset pascal/VOCdevkit/IMG --annotation pascal/VOCdevkit/ANN --train --trainer adam

I changed the following parameters in the file flow.py:

There are 120 images (40 images with only Dark Vador ,40 images with only Yoda and 40 images with both of them) and 120 annotations

My problem is that after 12 hours of training on cpu, and after having started the test with the --test argument, it displays NO BOXES in the output images. But when I decrease the threshold to 0.00001, it displays many boxes. I want to understand how can I improve my training to have correct object detections. Can you give me please some advices.

Thanks.

youyuge34 commented 6 years ago

@sharoseali Just follow the main site of Darkflow, at the root dir.

sharoseali commented 6 years ago

youyuge34............ i have checked the weights file and its corresponding cfg file .. they are giving the same error.. even yolov2-tiny-voc are also not working with their cfg..... Joseph redmon must be informed about these issues.......... Any how i am going to start training again on tiny- voc which i previously trained . lets see how it behaves this time.....

youyuge34 can u h play with darknet on Linux and coco data-set?? .. if yes what was your experience.

i have 2000 xml files in voc format .. .. I am thinking to convert them in coco format.. but i dont know how to train the data using coco in windows....

mohamedabdallah1996 commented 6 years ago

I face the same problem But I reduced the threshold to 0.0001 and I see many bounding boxes. so try to reduce the threshold and see your confidence

mohamedabdallah1996 commented 6 years ago

@thtrieu I reached to loss ~1.6 with training on 32 classes But the confidence for all the objects is still 0.0 that mean that the model didn't learn anything. How can I reduce the loss much more in order to get more confidence. I changed the batch size and learning rate but the loss still in the same range!

I need your help please! thanks in advance

Dhagash4 commented 6 years ago

How can I change the number of iterations I am doing it with 1500 images divided into six classes are there anyways to change number of iteration?

Dhagash4 commented 6 years ago

@denisli Can you show me the method to increase iteration at step 554 only I got a loss of 5.34 and I am training 1500 images for 6 classes is that enough or should I increase my dataset.

sharoseali commented 6 years ago

increase the epochs size... if you havent a large dataset ..yoi can increase the epochs size.... however..... if you need a better ..trained model... you must have at least 300 to 500 images per class. hence for this figure .. you can set epochs number to 1000

On Mon, Jun 4, 2018, 2:27 PM Dhagash4 notifications@github.com wrote:

@denisli https://github.com/denisli Can you show me the method to increase iteration at step 554 only I got a loss of 5.34 and I am training 1500 images for 6 classes is that enough or should I increase my dataset.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/thtrieu/darkflow/issues/80#issuecomment-394291583, or mute the thread https://github.com/notifications/unsubscribe-auth/AS8u8jLTOQsN-1WJh5_afkAjH2cbzBdDks5t5P1xgaJpZM4MYRwc .

Dhagash4 commented 6 years ago

@sharoseali Now I will be trying class by class I have 1000 images for that class lets see if I can get the bounding box with epoch 1000. Thank you for guiding me. I will let you know the result.

sharoseali commented 6 years ago

okay.. thats also the way to do this....what weights you are using.... .tiny yolo or other......?? let me know ...

On Tue, Jun 5, 2018, 10:12 AM Dhagash4 notifications@github.com wrote:

@sharoseali https://github.com/sharoseali Now I will be trying class by class I have 1000 images for that class lets see if I can get the bounding box with epoch 1000. Thank you for guiding me. I will let you know the result.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/thtrieu/darkflow/issues/80#issuecomment-394583846, or mute the thread https://github.com/notifications/unsubscribe-auth/AS8u8t0jkvXhcjxlGxkoKhksU6txLhz6ks5t5hNSgaJpZM4MYRwc .

Dhagash4 commented 6 years ago

I am training two classes with 945 image for one class and 405 for another I am using tiny-yolo-voc weights currently should I change the weights?

sharoseali commented 6 years ago

No , i was only asking to let me know..about weights. i used tiny yolo voc.... for training.. but i got error when testing testing my model.... i tried for other weights like yolo- voc weights ... but they were no matching with their corresponding cfg file...

so ... now i am looking for cfg and weights file which can match ..each other and train my model... anyhow ... you continue to train your model with more epochs and share....your results . ..best of luck....

On Tue, Jun 5, 2018, 4:46 PM Dhagash4 notifications@github.com wrote:

I am training two classes with 945 image for one class and 405 for another I am using tiny-yolo-voc weights currently should I change the weights?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/thtrieu/darkflow/issues/80#issuecomment-394679449, or mute the thread https://github.com/notifications/unsubscribe-auth/AS8u8kdPb9i_OWnjUMFjXJxZrcsfG7prks5t5m-dgaJpZM4MYRwc .

Dhagash4 commented 6 years ago

@sharoseali I got the bounding boxes after 5000 steps but the problem is when I downloaded a image from google and tested it was not detecting it. How can I solve that problem is it overfitting problem. Also it was not labelling it like stop sign its just getting bounding boxes nothing written on it which is it and all. Also not detecting anything in the video what to do anybody..... I am doing the training with LISA extension dataset from VIVA website

sharoseali commented 6 years ago

Dhagash4 .. I leave this work for some time after i got error .. and was busy in other work .. In coming days i will start again.......... have u accomplished................???

On Thu, Jun 7, 2018 at 3:48 PM Dhagash4 notifications@github.com wrote:

@sharoseali https://github.com/sharoseali I got the bounding boxes after 5000 steps but the problem is when I downloaded a image from google and tested it was not detecting it. How can I solve that problem is it overfitting problem....

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/thtrieu/darkflow/issues/80#issuecomment-395377473, or mute the thread https://github.com/notifications/unsubscribe-auth/AS8u8tF7w8PR0YI90DvXxEalqZeFL-ykks5t6QT8gaJpZM4MYRwc .

fogonthedowns commented 5 years ago

Start Command:

HDF5_DISABLE_VERSION_CHECK=2 nohup ./flow --model cfg/tiny-yolo-v2-aviator.cfg --load bin/tiny-yolo-v2.weights --train --annotation /home/ubuntu/model/labels --dataset /home/ubuntu/model/aviators --epoch 10 --batch 8 --savepb True --load 18250 --gpu 0.9 &

Dataset:

~/model/labels$  ls -1 | wc -l
187
~/model/aviators$ ls | wc -l
187

Loss

Finish 996 epoch(es)
step 22909 - loss 0.5680124759674072 - moving ave loss 0.582944897404057
step 22910 - loss 1.782407283782959 - moving ave loss 0.7028911360419472
step 22911 - loss 0.20126786828041077 - moving ave loss 0.6527288092657936
step 22912 - loss 0.4742392301559448 - moving ave loss 0.6348798513548087
step 22913 - loss 0.3661291003227234 - moving ave loss 0.6080047762516002
step 22914 - loss 0.6089756488800049 - moving ave loss 0.6081018635144406
step 22915 - loss 0.4250970184803009 - moving ave loss 0.5898013790110266
step 22916 - loss 0.6636741161346436 - moving ave loss 0.5971886527233883
step 22917 - loss 0.3915417194366455 - moving ave loss 0.576623959394714
step 22918 - loss 0.17965593934059143 - moving ave loss 0.5369271573893017
step 22919 - loss 0.31156492233276367 - moving ave loss 0.514390933883648
step 22920 - loss 0.6093173623085022 - moving ave loss 0.5238835767261334
step 22921 - loss 0.49582234025001526 - moving ave loss 0.5210774530785216
step 22922 - loss 0.6295650601387024 - moving ave loss 0.5319262137845396
step 22923 - loss 0.39114269614219666 - moving ave loss 0.5178478620203054
step 22924 - loss 0.5364546775817871 - moving ave loss 0.5197085435764536
step 22925 - loss 0.46883073449134827 - moving ave loss 0.514620762667943
step 22926 - loss 0.6072037220001221 - moving ave loss 0.5238790586011609
step 22927 - loss 0.3584549129009247 - moving ave loss 0.5073366440311373
step 22928 - loss 0.7908065319061279 - moving ave loss 0.5356836328186364
step 22929 - loss 0.48035216331481934 - moving ave loss 0.5301504858682546
step 22930 - loss 0.3851150870323181 - moving ave loss 0.515646945984661
step 22931 - loss 1.296918511390686 - moving ave loss 0.5937741025252635
Finish 997 epoch(es)

Config:

(more above this line)
size=2
stride=2

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=1

[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky

###########

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=512
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters=30
activation=linear

[region]
anchors =  0.57273, 0.677385, 1.87446, 2.06253, 3.33843, 5.47434, 7.88282, 3.52778, 9.77052, 9.16828
bias_match=1
classes=1
coords=4
num=5
softmax=1
jitter=.2
rescore=0

object_scale=5
noobject_scale=1
class_scale=1
coord_scale=1

absolute=1
thresh = .6

What loss is typical of "convergence"? I ran 1000 epochs (22k + steps!) which resulted in very very low loss ~0.1% However my bounding boxes were only drawn around images the model had previously seen (IE they are part of the training set) - I suspect either my training set of data isn't large enough or that the model is WAY overfit and it will only match images it has already seen.

  1. What is the difference between an Epoch and a Step? I notice many people reference steps (and their relationship to checkpoint see @denisli above.)
  2. Whats an acceptable amount of training images to use? I believe I have 200
  3. At what loss does "convergence" typically take place? Are you talking Epochs or "Steps"?
  4. Does this library divide my training images into Test, Train and Verify directories? How can I fight against overfitting?
  5. How are you determining overfitting? It is simply a loss < 0.1?
  6. How long should training take? This trains for a day on AWS with GPUs and its getting expensive!
aaronhan92 commented 5 years ago

It worked for me. It is relatively easy in Yolov2 to change the config file to incorporate your data (no additional changes). You need to train for more iterations. Initially, I wasn't detecting any bounding but after training for 40k iterations, I finally could see detection though the result was poor (you need to tune anchors). I used pre-trained imagenet weights.

Can you share your code?

AzadeAlizade commented 5 years ago

hey everyone I have the same issue after running through every steps of this page and training data in pascal voc; no object is detected. I changed threshold to 0 and some object has been detected but they are not really useful. what should I do??

ManasaNadimpalli commented 5 years ago

Hello,

Currently I am trying to train the yolo.cfg (version 2) with 2 labels. (I want to recognise Dark Vador and Yoda in my test images.) I changed the number of classes in yolo.cfg, and I renamed it yolo-5C.cfg. So I put 2 labels in labels.txt, I created the annotation files, and finally I started the training using CPU with this command:

./flow --model/yolo-5C.cfg --load bin/yolo.weights --dataset pascal/VOCdevkit/IMG --annotation pascal/VOCdevkit/ANN --train --trainer adam

I changed the following parameters in the file flow.py:

  • epochs = 100
  • batch = 16
  • learning rate = 1e-5

There are 120 images (40 images with only Dark Vador ,40 images with only Yoda and 40 images with both of them) and 120 annotations

My problem is that after 12 hours of training on cpu, and after having started the test with the --test argument, it displays NO BOXES in the output images. But when I decrease the threshold to 0.00001, it displays many boxes. I want to understand how can I improve my training to have correct object detections. Can you give me please some advices.

Thanks.

Hi sir, Iam training darknet using yolov3. I have trained 200 images and I can see the label but no bounding boxes around them.Can I know what is the reason?

RamShankarKumar commented 5 years ago

I am testing an image using the methos " Using darkflow from another python application" in spyder IDE. my program run well at last I get empty array with no prediction. what to do now? gitpic

ridhimagarg commented 5 years ago

Hi,

I am also facing the same issue. My model is not able to detect the bounding box. When I set the threshold to 0.00001, it is showing up too many boxes.

@ManasaNadimpalli Are you able to find out any solutions?

Please give some suggestions. I modified the .cfg file according to my class(# classes =1)

Alex0795 commented 5 years ago

@kamelbouyacoub como haces para disminuir el umbral y que te muestre muchos cuadros? Ayudame con eso porfavor

aseembh2001 commented 5 years ago

I has the same problem with not getting the bounding boxes. I trained on 87 images for one class. I decreased the learning rate to 1e-5 and I was able to get the correct bounding boxes, although not very high confidence(~20%) Hope this helps !!

absognety commented 4 years ago

I see you are doing YOLOv2. How much is the loss? I suspect yours has not converged.

I am also facing the same issue as @kamelbouyacoub
My loss after 1000 epochs is at 61.5860000

Finish 986 epoch(es)
step 1973 - loss 62.078346252441406 - moving ave loss 62.335486664291444
step 1974 - loss 62.121891021728516 - moving ave loss 62.31412710003515
Finish 987 epoch(es)
step 1975 - loss 62.219764709472656 - moving ave loss 62.30469086097891
step 1976 - loss 61.881935119628906 - moving ave loss 62.26241528684391
Finish 988 epoch(es)
step 1977 - loss 62.222434997558594 - moving ave loss 62.25841725791538
step 1978 - loss 61.85980224609375 - moving ave loss 62.21855575673322
Finish 989 epoch(es)
step 1979 - loss 62.035133361816406 - moving ave loss 62.20021351724154
step 1980 - loss 61.879722595214844 - moving ave loss 62.168164425038874
Finish 990 epoch(es)
step 1981 - loss 61.71182632446289 - moving ave loss 62.12253061498128
step 1982 - loss 61.67131042480469 - moving ave loss 62.077408595963625
Finish 991 epoch(es)
step 1983 - loss 61.771820068359375 - moving ave loss 62.0468497432032
step 1984 - loss 61.894561767578125 - moving ave loss 62.0316209456407
Finish 992 epoch(es)
step 1985 - loss 61.739654541015625 - moving ave loss 62.00242430517819
step 1986 - loss 61.7847900390625 - moving ave loss 61.980660878566624
Finish 993 epoch(es)
step 1987 - loss 61.47736740112305 - moving ave loss 61.93033153082227
step 1988 - loss 61.691654205322266 - moving ave loss 61.90646379827227
Finish 994 epoch(es)
step 1989 - loss 61.599735260009766 - moving ave loss 61.87579094444602
step 1990 - loss 61.71918487548828 - moving ave loss 61.860130337550245
Finish 995 epoch(es)
step 1991 - loss 61.71525573730469 - moving ave loss 61.84564287752569
step 1992 - loss 61.526390075683594 - moving ave loss 61.81371759734149
Finish 996 epoch(es)
step 1993 - loss 61.45462417602539 - moving ave loss 61.77780825520988
step 1994 - loss 61.457122802734375 - moving ave loss 61.74573970996233
Finish 997 epoch(es)
step 1995 - loss 61.439453125 - moving ave loss 61.715111051466096
step 1996 - loss 61.43961715698242 - moving ave loss 61.68756166201773
Finish 998 epoch(es)
step 1997 - loss 61.436065673828125 - moving ave loss 61.662412063198765
step 1998 - loss 61.47761535644531 - moving ave loss 61.643932392523425
Finish 999 epoch(es)
step 1999 - loss 61.33710479736328 - moving ave loss 61.61324963300741
step 2000 - loss 61.340763092041016 - moving ave loss 61.586000978910775
Checkpoint at step 2000
Finish 1000 epoch(es)
Training finished, exit.

Does this mean it is not converging?

slntopp commented 4 years ago

Have the same issue: loss < 0.45 after 15k steps, 1000+ images for each class. Tried overfitting with 20 images - it was fine. Using tiny-yolo-voc cfg and weights. Is there any solutions?(

xinyee1997 commented 4 years ago

I faced the problem too. No bounding box at all. Any solution?

ozanpkr commented 4 years ago

@thtrieu When I tried training Yolov2 with only PascalVoc2012 Car labeled data, I get 0.000007 loss.Although,I cannot see any bounding boxes when ı tested on image.Is that means over fitting?How can I solve that???

ludwikbukowski commented 4 years ago

the same here