train with multiple class : mAP did not improve from 0

dr-askar commented 5 years ago

hi when i try to train with one class , i get very good result even in warmup

` Seen labels: {'asc': 126, 'thick': 126, 'pe': 126, 'ae': 126} Given labels: ['asc'] Overlap labels: {'asc'}

Epoch 00001: val_loss improved from inf to 15.27983, saving model to cornea1w_bestLoss.h5

asc 0.6550 mAP: 0.6550 mAP improved from 0 to 0.6549659247757075, saving model to cornea1w_bestMap.h5. Epoch 2/165

335s - loss: 9.0857 - val_loss: 5.6974

Epoch 00002: val_loss improved from 15.27983 to 5.69738, saving model to cornea1w_bestLoss.h5

asc 0.8559 mAP: 0.8559 mAP improved from 0.6549659247757075 to 0.8559343469829404, saving model to cornea1w_bestMap.h5. Epoch 3/165`

but when i try to do multiple class training, i have very bad result

` Seen labels: {'asc': 126, 'thick': 126, 'pe': 126, 'ae': 126} Given labels: ['asc', 'thick', 'pe', 'ae'] Overlap labels: {'thick', 'asc', 'ae', 'pe'}

asc 0.0000 thick 0.0000 pe 0.0000 ae 0.0000 mAP: 0.0000 mAP did not improve from 0. Epoch 17/165

533s - loss: 14.4993 - val_loss: 15.2044

Epoch 00017: val_loss did not improve from 14.27190

asc 0.0000 thick 0.0000 pe 0.0000 ae 0.0000 mAP: 0.0000 mAP did not improve from 0. Epoch 18/165

561s - loss: 14.4320 - val_loss: 15.7807

Epoch 00018: val_loss did not improve from 14.27190

asc 0.0000 thick 0.0000 pe 0.0000 ae 0.0000 mAP: 0.0000 mAP did not improve from 0. Epoch 19/165`

even i use the same dataset an annotation and the only difference that i change config file

the first `{ "model" : { "backend": "MobileNet", "input_size_w": 416, "input_size_h": 416, "gray_mode": false, "anchors": [5.92000,3.72232, 10.23135,13.57805, 13.86562,23.77973, 19.81026,26.68416, 27.34583,27.71786], "max_box_per_image": 10,
"labels": ["asc"] },

"parser_annotation_type":    "xml",

"train": {
    "train_csv_file":       "",
    "train_csv_base_path":  "",
    "train_image_folder":   "/content/raccoon_dataset/cornea/images/",
    "train_annot_folder":   "/content/raccoon_dataset/cornea/ann/",     

    "callback":             null,
    "train_times":          8,
    "pretrained_weights":   "",
    "batch_size":           8,
    "learning_rate":        1e-4,
    "nb_epochs":            150,
    "warmup_epochs":        15,

    "workers":              12,
    "max_queue_size":       40,
    "early_stop":           true,
    "tensorboard_log_dir":  "./logs",

    "object_scale":         6.0 ,
    "no_object_scale":      1.0,
    "coord_scale":          2.0,
    "class_scale":          1.0,

    "saved_weights_name":   "cornea1w.h5",
    "debug":                true
},

"valid": {
    "iou_threshold":        0.7,
    "score_threshold":      0.5,
    "valid_csv_file":       "",
    "valid_csv_base_path":  "",
    "valid_image_folder":   "",
    "valid_annot_folder":   "",

    "valid_times":          1
},

"backup":{
    "create_backup":        false,
    "redirect_model":       true,
    "backup_path":          "../backup",
    "backup_prefix":        "Tiny_yolo_VOC"
}

}`

the second `{ "model" : { "backend": "MobileNet", "input_size_w": 416, "input_size_h": 416, "gray_mode": false, "anchors": [0.00000,0.00000, 0.00000,0.00000, 13.61595,16.01708, 13.78384,15.74386, 13.86779,15.88047], "max_box_per_image": 10,
"labels": ["asc","thick","pe","ae"] },

"parser_annotation_type":    "xml",

"train": {
    "train_csv_file":       "",
    "train_csv_base_path":  "",
    "train_image_folder":   "/content/raccoon_dataset/cornea/images/",
    "train_annot_folder":   "/content/raccoon_dataset/cornea/ann/",     

    "callback":             null,
    "train_times":          8,
    "pretrained_weights":   "",
    "batch_size":           8,
    "learning_rate":        1e-4,
    "nb_epochs":            150,
    "warmup_epochs":        15,

    "workers":              12,
    "max_queue_size":       40,
    "early_stop":           true,
    "tensorboard_log_dir":  "./logs",

    "object_scale":         6.0 ,
    "no_object_scale":      1.0,
    "coord_scale":          2.0,
    "class_scale":          1.0,

    "saved_weights_name":   "cornea4w.h5",
    "debug":                true
},

"valid": {
    "iou_threshold":        0.7,
    "score_threshold":      0.5,
    "valid_csv_file":       "",
    "valid_csv_base_path":  "",
    "valid_image_folder":   "",
    "valid_annot_folder":   "",

    "valid_times":          1
},

"backup":{
    "create_backup":        false,
    "redirect_model":       true,
    "backup_path":          "../backup",
    "backup_prefix":        "Tiny_yolo_VOC"
}

}`

thanx

dr-askar commented 5 years ago

this is the last result after 4 hour of training `asc 0.0000 thick 0.0000 pe 0.0000 ae 0.0000 mAP: 0.0000 mAP did not improve from 0. Epoch 33/165

1079s - loss: 13.8510 - val_loss: 27.5840

Epoch 00033: val_loss did not improve from 14.27190

asc 0.0000 thick 0.0000 pe 0.0000 ae 0.0000 mAP: 0.0000 mAP did not improve from 0. Epoch 00033: early stopping`

i even try to train on tow class with the same result

rodrigo2019 commented 5 years ago

@dr-askar this is a kind of expected situation, because in this case even the object detection, the network must do the obj classification, I mean, in this case it's a harder situation to get higher mAP than a single class detection. but even being a harder situation I think you should be able to get some mAP like 50%. here you can see a situation that I trained more than 5 hours to start to get some results.

So, what I can recommend to you is: disable the eraly stop in your config.json: decrease the obj_scale for 5 or 4, and increase class_scale for 2 or 3 also change: "iou_threshold": 0.3, "score_threshold": 0.3

dr-askar commented 5 years ago

thank you very much

rodrigo2019 commented 5 years ago

Are you fixed this issue?

ikerodl96 commented 5 years ago

@dr-askar @rodrigo2019 any updates regarding this issue? I am facing the very same problem... Hours and hours of training to get nothing. However, in my case I have 10 different classes "ref209", "ref209_1", "ref209_3", "ref209_4", "ref209_5", "ref209_6", "ref209_7", "ref210_1", "tool209_1", "tool209_2" which are different parts of an automotive piece.

Any recommendation regarding the backend? Batch size? Learning rate? Currently I have used both the InceptionV3, Full Yolo and ResNet50 (by the way, for this last the loss function gives NaNs), with bach sizes of 1 a learning rate of about 0.00025 and 900 epochs. I have also decreased the iou and score threshold. The EarlyStopping is of course disabled.

Thanks in advance

rodrigo2019 / keras_yolo2

train with multiple class : mAP did not improve from 0 #6