thtrieu / darkflow

Translate darknet to tensorflow. Load trained weights, retrain/fine-tune using tensorflow, export constant graph def to mobile devices
GNU General Public License v3.0
6.13k stars 2.08k forks source link

Minimum number of training steps #355

Open Heidisnaps opened 7 years ago

Heidisnaps commented 7 years ago

Hi.everyone.

I train just 5 class from VOK2012 dataset

training result is ...

step: more than 14000 loss: The value is continuously changed between 2 and 3.

and then I created my-yolo.pb and my-yolo.meta, and prediction like as: flow --pbLoad built_graph/my-yolo.pb --metaLoad built_graph/my-yolo.meta --imgdir sample_img/

but it can't detect anything.

What is the minimum number of steps to detect?

jubjamie commented 7 years ago

Usually until your lost has just converged. You can visualise the training from the summary folder using Tensorboard.

It might not be the number of steps that are the problem. It could be that you're training was never set correctly in the first place, your threshold is too high, different category names etc. Have you managed to do any further troubleshooting?

Heidisnaps commented 7 years ago

@jubjamie

my class:

bicycle bird boat person There were four, not five. However, the number of labels and the number of classes in cfg are the same.

my cfg: `[net] batch=32 subdivisions=8 width=416 height=416 channels=3 momentum=0.9 decay=0.0005 angle=0 saturation = 1.5 exposure = 1.5 hue=.1

learning_rate=0.001 max_batches = 20100 policy=steps steps=-1,100,20000,30000 scales=10,.1,.1

[convolutional] batch_normalize=1 filters=16 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=32 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=64 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=1

[convolutional] batch_normalize=1 filters=1024 size=3 stride=1 pad=1 activation=leaky

###########

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=1024 activation=leaky

[convolutional] size=1 stride=1 pad=1 filters=45 activation=linear

[region] anchors = 1.08,1.19, 3.42,4.41, 6.63,11.38, 9.42,5.11, 16.62,10.52 bias_match=1 classes=4 coords=4 num=5 softmax=1 jitter=.2 rescore=1

object_scale=5 noobject_scale=1 class_scale=1 coord_scale=1

absolute=1 thresh=.6 random=1`

I chose four classes, but in fact, I just want to find people. Other classes are not important. How do I change the initial value?

jubjamie commented 7 years ago

What initial value?

Heidisnaps commented 7 years ago

@jubjamie

I tried again with 20 classes.


~$ flow --train --model cfg/tiny-yolo-voc-new.cfg --dataset "~/VOCdevkit/VOC2007/JPEGImages" --annotation "~/VOCdevkit/VOC2007/Annotations" --gpu 1.0

Parsing cfg/tiny-yolo-voc-new.cfg Loading None ... Finished in 0.00010204315185546875s

Building net ... Source | Train? | Layer description | Output size -------+--------+----------------------------------+--------------- | | input | (?, 416, 416, 3) Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 416, 416, 16) Load | Yep! | maxp 2x2p0_2 | (?, 208, 208, 16) Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 208, 208, 32) Load | Yep! | maxp 2x2p0_2 | (?, 104, 104, 32) Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 104, 104, 64) Load | Yep! | maxp 2x2p0_2 | (?, 52, 52, 64) Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 52, 52, 128) Load | Yep! | maxp 2x2p0_2 | (?, 26, 26, 128) Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 26, 26, 256) Load | Yep! | maxp 2x2p0_2 | (?, 13, 13, 256) Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 13, 13, 512) Load | Yep! | maxp 2x2p0_1 | (?, 13, 13, 512) Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 13, 13, 1024) Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 13, 13, 1024) Init | Yep! | conv 1x1p0_1 linear | (?, 13, 13, 125) -------+--------+----------------------------------+--------------- GPU mode with 1.0 usage

cfg/tiny-yolo-voc-new.cfg loss hyper-parameters: H = 13 W = 13 box = 5 classes = 20 scales = [1.0, 5.0, 1.0, 1.0]

Parsing for ['aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', 'cat', 'chair', 'cow', 'diningtable', 'dog', 'horse', 'motorbike', 'person', 'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor'] [====================>]100% 007614.xml Statistics: bird: 599 train: 328 diningtable: 310 chair: 1432 bus: 272 dog: 538 cat: 389 bottle: 634 motorbike: 390 tvmonitor: 367 person: 5447 pottedplant: 625 sheep: 353 bicycle: 418 car: 1644 horse: 406 aeroplane: 331 sofa: 425 cow: 356 boat: 398 Dataset size: 5011 Dataset of 5011 instance(s) Training statistics: Learning rate : 1e-05 Batch size : 16 Epoch number : 1000 Backup every : 2000

step 1 - loss 109.52924346923828 - moving ave loss 109.52924346923828 step 2 - loss 109.86571502685547 - moving ave loss 109.56289062500001 step 3 - loss 110.08113098144531 - moving ave loss 109.61471466064454 . . . step 12492 - loss 10.850701332092285 - moving ave loss 8.618697419224091 step 12493 - loss 6.189124584197998 - moving ave loss 8.375740135721482 step 12494 - loss 9.410148620605469 - moving ave loss 8.479180984209881 step 12495 - loss 7.525200843811035 - moving ave loss 8.383782970169996

~$ flow --model cfg/tiny-yolo-voc-new.cfg --load -1 --savepb
~$ flow --pbLoad built_graph/tiny-yolo-voc-new.pb --metaLoad built_graph/tiny-yolo-voc-new.meta --imgdir sample_img/ --json

But this also does not detect anything.

jubjamie commented 7 years ago

does it detect before you translate to pb files? Check to see if it's a translation error. I also don't see where you are saving your weights? What if you use transfer learning?

Gowan1998 commented 7 years ago

Hey Heidisnaps,

i think you have the same problem as i have. The problem is, that after a training with more than 14000 stages/steps savepb doesn't work anymore! If you build the savepb file and want to detect something with it in Android (or use : flow --pbLoad built_graph/yolo-new.pb --metaLoad built_graph/yolo-new.meta --imgdir sample_img/) it doesnt works :(

I think there is a Bug, because the weights works right. For example you can try: ./flow --imgdir sample_img/ --model cfg/yolo-new.cfg --load 1500 and it works!

And the problem appears after 14000 training steps! I try a lot of combinations and train more than 600000 steps and it doesnt works! But with less than 14000 steps you didnt have any problems!

Heidisnaps commented 7 years ago

@jubjamie

Translate to pb files is no error ~$ flow --model cfg/tiny-yolo-voc-new.cfg --load 153 --savepb

Parsing cfg/tiny-yolo-voc-new.cfg Loading None ... Finished in 0.00011944770812988281s

Building net ... Source | Train? | Layer description | Output size -------+--------+----------------------------------+--------------- | | input | (?, 416, 416, 3) Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 416, 416, 16) Load | Yep! | maxp 2x2p0_2 | (?, 208, 208, 16) Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 208, 208, 32) Load | Yep! | maxp 2x2p0_2 | (?, 104, 104, 32) Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 104, 104, 64) Load | Yep! | maxp 2x2p0_2 | (?, 52, 52, 64) Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 52, 52, 128) Load | Yep! | maxp 2x2p0_2 | (?, 26, 26, 128) Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 26, 26, 256) Load | Yep! | maxp 2x2p0_2 | (?, 13, 13, 256) Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 13, 13, 512) Load | Yep! | maxp 2x2p0_1 | (?, 13, 13, 512) Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 13, 13, 1024) Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 13, 13, 1024) Init | Yep! | conv 1x1p0_1 linear | (?, 13, 13, 125) -------+--------+----------------------------------+--------------- Running entirely on CPU Loading from ./ckpt/tiny-yolo-voc-new-153 Finished in 7.258143663406372s Rebuild a constant version ... Done

Heidisnaps commented 7 years ago

@Savash2016 I tried under 14000. like as: ~$ flow --model cfg/tiny-yolo-voc-new.cfg --load 12000 --savepb

it is work!thanks!! But the accuracy is too low. How did you improve the accuracy?

Gowan1998 commented 7 years ago

Hey Heidisnaps,

its a bug i think :( The only way to get a better accuracy is to train more than 300000 steps (depends on your data). But the problem is, if you do this your pb file does not work anymore.

I hope the author will fix it soon. Maybe you can try also send him a Mail and ask him for fixing :/ I think this is the only way.

junxuezheng commented 6 years ago

@Heidisnaps
Hey Heidisnaps,can you send to me the code about the accuracy. thanks,my gmail junxuezheng@gmail.com

junxuezheng commented 6 years ago

@Savash2016 hey Savash2016 ;

the cfg have the max_batches ,in yolo-voc max_batches is 45000,in tiny-yolo-voc max_batches is 40100(voc 2007). i think the max_batches is the max_step. i want to get a better accuracy,does it need to train more than 300000 steps? and,when it run above 40000 steps ,the loss does not descend any more.( the loss is between 4.0-5.0) thanks ,my gmai is junxuezheng@gmai.com

Heidisnaps commented 6 years ago

@junxuezheng Sorry, but I do not have any code at this time.