thtrieu / darkflow

Translate darknet to tensorflow. Load trained weights, retrain/fine-tune using tensorflow, export constant graph def to mobile devices
GNU General Public License v3.0
6.13k stars 2.08k forks source link

Everything inside an image is detected as human if I only train last layer and the training loss is around 2 #384

Open tankienleong opened 7 years ago

tankienleong commented 7 years ago

I'm using the pre-trained weights and set the self.ntrain=1 to train only the last layer to detect human. I also had modified the number of classes and filters of the last two layers in cfg file. I stop the training when the loss is around 2 (because it seem like cannot decrease anymore and already reach 50 epochs) and use the latest ckpt file to run testing. The result is very bad because it detect everything as human. The file size of ckpt file (self.ntrain=1) is relatively small compared to the ckpt file (self.ntrain=52). If I set the self.ntrain=52 during training, the testing result is good. Does the pre-trained weights for the first 51 layers store into ckpt file during training?

rlan commented 7 years ago

What is your cfg file?

tankienleong commented 7 years ago

[net]

Testing

batch=1

subdivisions=1

Training

batch=64 subdivisions=2 width=416 height=416 channels=3 momentum=0.9 decay=0.0005 angle=0 saturation = 1.5 exposure = 1.5 hue=.1

learning_rate=0.001 burn_in=1000 max_batches = 500200 policy=steps steps=400000,450000 scales=.1,.1

[convolutional] batch_normalize=1 filters=32 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=64 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=64 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=1024 size=3 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=1024 size=3 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=1024 size=3 stride=1 pad=1 activation=leaky

#######

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=1024 activation=leaky

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=1024 activation=leaky

[route] layers=-9

[convolutional] batch_normalize=1 size=1 stride=1 pad=1 filters=64 activation=leaky

[reorg] stride=2

[route] layers=-1,-4

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=1024 activation=leaky

[convolutional] size=1 stride=1 pad=1 filters=30 activation=linear

[region] anchors = 0.57273, 0.677385, 1.87446, 2.06253, 3.33843, 5.47434, 7.88282, 3.52778, 9.77052, 9.16828 bias_match=1 classes=1 coords=4 num=5 softmax=1 jitter=.3 rescore=1

object_scale=5 noobject_scale=1 class_scale=1 coord_scale=1

absolute=1 thresh = .1 random=1

rlan commented 7 years ago

You have these correct.

filters=30 classes=1

Have you tried NOT modifying code and refined an already trained network? e.g. tiny-yolo-voc has a person label.

I also suggest adding tensorflow summaries to code so you can check what is going on internally with TensorBoard. See #411

stanifrolov commented 7 years ago

All weights are stored in the ckpt file. How do you know it is setting the layer to train starting from the last?

Alternatively, you could also implement applying gradients only to the last layer in https://github.com/thtrieu/darkflow/blob/master/darkflow/net/help.py

e.g.

var_list = [var for var in tf.trainable_variables() if "LAYER_NAME" in var.name] gradients = optimizer.compute_gradients(self.framework.loss, var_list=var_list) self.train_op = optimizer.apply_gradients(gradients)