zzh8829 / yolov3-tf2

YoloV3 Implemented in Tensorflow 2.0
MIT License
2.51k stars 908 forks source link

Training with one class #141

Open AndreiJurj12 opened 4 years ago

AndreiJurj12 commented 4 years ago

Hello, First I want to thank you for this github and updating it, especially the recent tutorial on how to train.

After the latest update, I decided that I will use the model to train it for one class detection (person), but still using the VOC dataset. Initially, I followed your tutorial on VOC dataset and managed to train it successfully (not really interested in numbers, accuracy - just that it seems to work) on Google Collab, my only issue being that when I tried to run the command of visualizing the dataset, I would receive module yolov3_tf2 not found (some python issue since in pycharm it was working), so I worked around it by moving visualize_dataset.py in the main folder.

After, I begin to prepare my new data records by modifying the voc script in order to eliminate all other annotations and adding my new custom.names containing only person class. I begin to train again for that only class with the loss decreasing kinda as before and I checked on few examples the detections after. The model was trained to some degree correctly, making detections but the maximum confidence score was 0.5 no matter the example (even images from train dataset which should be closer to overfitting). I have gone through previous issues and found this one: https://github.com/zzh8829/yolov3-tf2/issues/70#issuecomment-540842629

After modifying it and retraining again (even though that code doesn't actually affect the training part), the network managed to get confidence scores over 0.5 close to 1, so I assume this somehow apparently solved the issue. However, I'm concerned if this really solves the issue or not... I am certainly not understanding the code that well to be able to decide that, so I am asking for some little explanation about that yolo_nms. If that modification is correct, it might be useful to have it in the official code since from what I've seen many people try to train the model only with one class and it is a recurring issue to some degree.

zzh8829 commented 4 years ago

this problem has been raise before, this may have something to do with categorical_cross_entropy in the loss function. I can take a look when i get time, but right now there are two hacky solutions

  1. change config to 2 classes with only one class in dataset
  2. change categorical_cross_entropy to binary cross entropy

Let me know if these works

osljw commented 4 years ago

@johntyty912 @zzh8829

I have trained on wider_face (face detect only) dataset see: https://github.com/osljw/yolov3-tf2.git
doc: https://github.com/osljw/yolov3-tf2/blob/wider_face/docs/training_wider.md

Training from random weights (which I try)

LordTrololo commented 4 years ago

Hi, I am also training with one class only, my image dataset is split 150/50. Training seems to be ok (I get a lot of warnings but I'm not sure are they themself any problem). Here is the training output with batch size 4 and epochs 30, using darkent and eager-fit.

Epoch 00030: saving model to checkpoints/yolov3_train_30.tf
38/38 [==============================] - 32s 833ms/step - loss: 13.5958 - yolo_output_0_loss: 0.6670 - yolo_output_1_loss: 0.4185 - yolo_output_2_loss: 1.6939 - val_loss: 15.6806 - val_yolo_output_0_loss: 2.9382 - val_yolo_output_1_loss: 0.2265 - val_yolo_output_2_loss: 1.7019
WARNING:tensorflow:Unresolved object in checkpoint: (root).layer-8
W0303 12:27:50.259345  3168 util.py:144] Unresolved object in checkpoint: (root).layer-8
WARNING:tensorflow:Unresolved object in checkpoint: (root).layer-9
W0303 12:27:50.260312  3168 util.py:144] Unresolved object in checkpoint: (root).layer-9
WARNING:tensorflow:Unresolved object in checkpoint: (root).layer-10
W0303 12:27:50.261312  3168 util.py:144] Unresolved object in checkpoint: (root).layer-10
WARNING:tensorflow:Unresolved object in checkpoint: (root).layer-11
W0303 12:27:50.261312  3168 util.py:144] Unresolved object in checkpoint: (root).layer-11
WARNING:tensorflow:A checkpoint was restored (e.g. tf.train.Checkpoint.restore or tf.keras.Model.load_weights) but not all checkpointed values were used. See above for specific issues. Use expect_partial() on the load status object, e.g. tf.train.Checkpoint.restore(...).expect_partial(), to silence these warnings, or use assert_consumed() to make the check explicit. See https://www.tensorflow.org/guide/checkpoint#loading_mechanics for details.
W0303 12:27:50.261312  3168 util.py:152] A checkpoint was restored (e.g. tf.train.Checkpoint.restore or tf.keras.Model.load_weights) but not all checkpointed values were used. See above for specific issues. Use expect_partial() on the load status object, e.g. tf.train.Checkpoint.restore(...).expect_partial(), to silence these warnings, or use assert_consumed() to make the check explicit. See https://www.tensorflow.org/guide/checkpoint#loading_mechanics for details.

However I also get relatively low confidence scores, never more than 0.5 and often nothing is detected.

So I would just like double check are there still open issues specific to when one uses only one class ?

OAfzal commented 3 years ago

Unfortunately, I also have been getting 0 detections on the val set. There are detections for the train set but none for the val set. Any Ideas what can be done. I even tried #70

kalikhademi commented 3 years ago

I am also training with one class. I did not encounter any problem during the training but when I ran detect.py there is no detection for picture which I know it has more than one objects in it. Does anyone know what might be the solution for this?