train problem - Githubissues

Tinatchen commented 6 years ago

I changed train classes to 7 classes, but got a terrible model.

test_0 01

overfitover commented 6 years ago

Do you have correct labels?

bzhong2 commented 6 years ago

I have the same problem. The score is very small somehow.

Tinatchen commented 6 years ago

171206_033632249_Camera_6.jpg 1806,2031,2260,2715,0 1762,1324,2343,1850,0 1757,1401,1865,1573,0
This is my label file, I changed the classes to 7 classes. I loaded pretrained model to train and trained 40000 pictures, but performs badly with my own model. but the model performs well with coco classes, and the inference result with pretrained model is good.

qqwweee commented 6 years ago

@ClovisChen

you loaded darknet53 weights? And no freezing layers?
you trained 30 epochs? What is train loss and val loss like?

Tinatchen commented 6 years ago

@qqwweee thank you for your model, sorry I did not see darknet53 weights before.

I converted yolov3.weight to yolo.h5, and loaded it to train. I tried freezing layers and no freezing layers.
After trained 13 epochs, the task is stopped by itself. the train-loss and val loss is 140-150 with the 1024 1024 input shape. Train-loss and val-loss is 30-37 with 512 512 input shape. Train-loss and val-loss is 7-13 with 256*256 input shape.

86 this problem is similar with my trouble.

I am trying to load darknet53 weights, with freezing layers and no freezing layers. I will stick my model performance when I finish my model training.

cooli7wa commented 6 years ago

@ClovisChen

I converted yolov3.weight to yolo.h5, and loaded it to train.

yolo.h5 is 80 classes, your model is 7 classes, use pretrained yolo.h5 to train your model, is that possible? But loss is so low, why?

Tinatchen commented 6 years ago

@cooli7wa when yolo.h5 is loaded, the freezing layers is not include the last 3 layers. So the class numbers can be changed. my loss is low and stable, but the performance is terrible, the score is 0-0.03, and the bounding boxes are not in the right place. I am trying to figure it out.

I am trying to load darknet53 weights for training now.

cooli7wa commented 6 years ago

@ClovisChen Sorry, I still can't understand.

# => different class numbers should create different model_body?
model_body = yolo_body(image_input, num_anchors//3, num_classes)
...
# => So why dismatch weights can load into model? because of skip_mismatch=True?
        model_body.load_weights(weights_path, by_name=True, skip_mismatch=True)
        if freeze_body:
            # Do not freeze 3 output layers.
            # => why just not freeze last 3 output layers? model get 3 outputs, y1,y2,y3, but why only y3 not freeze?
            for i in range(len(model_body.layers)-3):
                model_body.layers[i].trainable = False

Thank you for your reply

Tinatchen commented 6 years ago

@cooli7wa

 if freeze_body:
            # => Do not freeze 3 output layers.
            for i in range(len(model_body.layers)-3):
                model_body.layers[i].trainable = False

when load the pretrained model, the last 3 layers are not loaded.

def yolo_body(inputs, num_anchors, num_classes):

    """Create YOLO_V3 model CNN body in Keras."""
    darknet = Model(inputs, darknet_body(inputs))
    # => this is the last 3 layers: 1/3
    # => the out_filters is  num_classes + 5
    x, y1 = make_last_layers(darknet.output, 512, num_anchors*(num_classes+5))

    x = compose(
            DarknetConv2D_BN_Leaky(256, (1,1)),
            UpSampling2D(2))(x)
    x = Concatenate()([x,darknet.layers[152].output])
    # => this is the last 3 layers: 2/3
    x, y2 = make_last_layers(x, 256, num_anchors*(num_classes+5))

    x = compose(
            DarknetConv2D_BN_Leaky(128, (1,1)),
            UpSampling2D(2))(x)
    x = Concatenate()([x,darknet.layers[92].output])
    # => this is the last 3 layers: 3/3
    x, y3 = make_last_layers(x, 128, num_anchors*(num_classes+5))

    return Model(inputs, [y1,y2,y3])

the function make_last_layers adapt the num_classes by out_filters. So we can change the num_classes.

cooli7wa commented 6 years ago

@ClovisChen Thank you, I need time to think about it.

xugaoxiang commented 6 years ago

Need a tutorial about training own dataset(including label tool) seriously for beginers.

bzhong2 commented 6 years ago

https://github.com/experiencor/keras-yolo3 This repo gave me a much better performance with the same dataset.

stefanbo92 commented 6 years ago

Hey, I faced the same issue when training for my own classes: I was training only the last layers (frozen model) and loss went down quickly but the inference results remained quite bad. It turned out that I just have not trained long enough, so maybe just try out training for ~150 epochs and remove the early stopping. Even if the loss is not decreasing much the model makes some progress.

I also created a fork wich implements training with bottleneck features. This calculates the output of the frozen layers first (bottleneck features) and then trains only the last layers very quickly (about 30 times faster on my laptop). This makes the training much quicker, even on a CPU, and you can try different training parameters.

Code for it can be found here: https://github.com/stefanbo92/keras-yolo3

@qqwweee I created a pull request if you like to merge this

jyqian-aibee commented 6 years ago

@stefanbo92 thanks for the effort. I tried your code, but it seems it is only available when the dataset size is small. In my case, I have a big dataset, and calculating the last layer features for all of them causes a memory issue and forces my program to be killed. Do you have any suggestions on how to get around this?

stefanbo92 commented 6 years ago

You are right, the bottleneck training works only if you can load all the training data into memory and is especially well suited for smaller datasets. However, you could still precompute the bottlenecks for all your images and instead of saving it as one large file, you can save one bottleneck value set for each image and then rewrite the generator so it will load these "single" bottlenecks.

fourth-archive commented 5 years ago

@ClovisChen @overfitover @bzhong2 @qqwweee @cooli7wa @stefanbo92 @jyqian-aibee this YOLOv3 tutorial may help you: https://github.com/ultralytics/yolov3/wiki/Train-Custom-Data

The accompanying repository works on MacOS, Windows and Linux, includes multigpu and multithreading, performs inference on images, videos, webcams, and an iOS app. It also tests to slightly higher mAPs than darknet, including on the latest YOLOv3-SPP.weights (60.7 COCO mAP), and offers the ability to train custom datasets from scratch to darknet performance, all using PyTorch :) https://github.com/ultralytics/yolov3

qqwweee / keras-yolo3

train problem #85

86 this problem is similar with my trouble.