Closed jxwleong closed 1 year ago
Removed the classes with minority samples and observe whether it will improve the accuracy of the models.
Label distribution:
HS 2086
C-Rubble 1025
Sand 945
Acropora 615
Acr_dig 599
Pavona 462
Porites 349
Acr_tab 314
BAD 259
Goniastrea 198
Monti_encr 198
Dark 191
Monti 149
Algae 142
Pocill 142
Millepora 77
The results seems to be worst. Will retry the annotations with two major classes only.
{
"loss": 7.991566181182861,
"accuracy": 0.0,
"precision": 0.044921875,
"recall": 0.044921875,
"auc": 0.5192165374755859,
"true_positives": 69.0,
"true_negatives": 21573.0,
"false_positives": 1467.0,
"false_negatives": 1467.0,
"traning_time_in_seconds": 4579.673694372177,
"annotation_file": "C:\\Users\\ad_xleong\\Desktop\\coral-sleuth\\data\\annotations\\annotations_coralnet_only_trimmed.csv"
}
{
"loss": 11.53290843963623,
"accuracy": 0.0,
"precision_1": 0.12763157486915588,
"recall_1": 0.1263020783662796,
"auc_1": 0.5612916946411133,
"true_positives_1": 194.0,
"true_negatives_1": 21714.0,
"false_positives_1": 1326.0,
"false_negatives_1": 1342.0,
"traning_time_in_seconds": 3546.7258007526398,
"annotation_file": "C:\\Users\\ad_xleong\\Desktop\\coral-sleuth\\data\\annotations\\annotations_coralnet_only_trimmed.csv"
}
{
"loss": 2.6154720783233643,
"accuracy": 0.0,
"precision_2": 0.3186206817626953,
"recall_2": 0.150390625,
"auc_2": 0.8254724740982056,
"true_positives_2": 231.0,
"true_negatives_2": 22546.0,
"false_positives_2": 494.0,
"false_negatives_2": 1305.0,
"traning_time_in_seconds": 12091.830829381943,
"annotation_file": "C:\\Users\\ad_xleong\\Desktop\\coral-sleuth\\data\\annotations\\annotations_coralnet_only_trimmed.csv"
}
{
"loss": 3.3700144290924072,
"accuracy": 0.0,
"precision_3": 0.03846153989434242,
"recall_3": 0.0006510416860692203,
"auc_3": 0.6806814670562744,
"true_positives_3": 1.0,
"true_negatives_3": 23015.0,
"false_positives_3": 25.0,
"false_negatives_3": 1535.0,
"traning_time_in_seconds": 2705.1154963970184,
"annotation_file": "C:\\Users\\ad_xleong\\Desktop\\coral-sleuth\\data\\annotations\\annotations_coralnet_only_trimmed.csv"
}
{
"loss": 2.2891685962677,
"accuracy": 0.0,
"precision_4": 0.4533762037754059,
"recall_4": 0.18359375,
"auc_4": 0.8210552930831909,
"true_positives_4": 282.0,
"true_negatives_4": 22700.0,
"false_positives_4": 340.0,
"false_negatives_4": 1254.0,
"traning_time_in_seconds": 1434.7197000980377,
"annotation_file": "C:\\Users\\ad_xleong\\Desktop\\coral-sleuth\\data\\annotations\\annotations_coralnet_only_trimmed.csv"
}
Training the models with just two main majority classes in the dataset (CoralNet only) to determine whether the class imbalance is the cause of the low accuracy.
Label distribution:
HS 2086
C-Rubble 1025
EfficinetNet V2 seems to have the best results among other CNN models. WIll conduct further experiments to confirm whether to go with EfficientNet V2.
{
"loss": 30.188608169555664,
"accuracy": 0.16447368264198303,
"precision": 0.3305920958518982,
"recall": 0.3305920958518982,
"auc": 0.3305920958518982,
"true_positives": 201.0,
"true_negatives": 201.0,
"false_positives": 407.0,
"false_negatives": 407.0,
"traning_time_in_seconds": 1812.7397768497467,
"batch_size": 16,
"epoch": 5,
"annotation_filepath": "C:\\Users\\ad_xleong\\Desktop\\coral-sleuth\\data\\annotations\\annotations_coralnet_only_HS_C-Rubble.csv",
"images_count": 375,
"annotation_count": 3111,
"annotation_label_count": 2
}
{
"loss": 0.0,
"accuracy": 0.6332237124443054,
"precision_1": 1.0,
"recall_1": 1.0,
"auc_1": 1.0,
"true_positives_1": 608.0,
"true_negatives_1": 608.0,
"false_positives_1": 0.0,
"false_negatives_1": 0.0,
"traning_time_in_seconds": 1500.6001937389374,
"batch_size": 16,
"epoch": 5,
"annotation_filepath": "C:\\Users\\ad_xleong\\Desktop\\coral-sleuth\\data\\annotations\\annotations_coralnet_only_HS_C-Rubble.csv",
"images_count": 375,
"annotation_count": 3111,
"annotation_label_count": 2
}
{
"loss": 1.7255895137786865,
"accuracy": 0.0,
"precision_3": 0.6661184430122375,
"recall_3": 0.6661184430122375,
"auc_3": 0.6589727401733398,
"true_positives_3": 405.0,
"true_negatives_3": 405.0,
"false_positives_3": 203.0,
"false_negatives_3": 203.0,
"traning_time_in_seconds": 1088.2479541301727,
"batch_size": 16,
"epoch": 5,
"annotation_filepath": "C:\\Users\\ad_xleong\\Desktop\\coral-sleuth\\data\\annotations\\annotations_coralnet_only_HS_C-Rubble.csv",
"images_count": 375,
"annotation_count": 3111,
"annotation_label_count": 2
}
{
"loss": 2.505999803543091,
"accuracy": 0.021381579339504242,
"precision_2": 0.6611841917037964,
"recall_2": 0.6611841917037964,
"auc_2": 0.6613316535949707,
"true_positives_2": 402.0,
"true_negatives_2": 402.0,
"false_positives_2": 206.0,
"false_negatives_2": 206.0,
"traning_time_in_seconds": 4815.7375745773315,
"batch_size": 16,
"epoch": 5,
"annotation_filepath": "C:\\Users\\ad_xleong\\Desktop\\coral-sleuth\\data\\annotations\\annotations_coralnet_only_HS_C-Rubble.csv",
"images_count": 375,
"annotation_count": 3111,
"annotation_label_count": 2
}
{
"loss": 0.21554377675056458,
"accuracy": 0.0,
"precision_4": 0.9276315569877625,
"recall_4": 0.9276315569877625,
"auc_4": 0.9817266464233398,
"true_positives_4": 564.0,
"true_negatives_4": 564.0,
"false_positives_4": 44.0,
"false_negatives_4": 44.0,
"traning_time_in_seconds": 585.4425690174103,
"batch_size": 16,
"epoch": 5,
"annotation_filepath": "C:\\Users\\ad_xleong\\Desktop\\coral-sleuth\\data\\annotations\\annotations_coralnet_only_HS_C-Rubble.csv",
"images_count": 375,
"annotation_count": 3111,
"annotation_label_count": 2
}
Found some interesting observations when looking at the Label distribution again. I noticed that are a few similar labels with different label strings. For example:
Perform data cleaning on similar labels and convert them into the same label. Then, retrain the model and see if the performance improves.
Added scripts to remap the labels of the annotation files. Will retrain the models with the newly remapped annotation files.
Mapping
label_mapping = {
"Sand": "sand",
"SAND": "sand",
"Turf": "turf",
"TURF": "turf",
"Porit": "porites",
"PORIT": "porites",
"Macro": "macroalgae",
"MACRO": "macroalgae",
# https://coralnet.ucsd.edu/label/85/
"Off": "off",
"OFF": "off",
"Pocill": "pocillopora",
"POCILL": "pocillopora",
"Monti": "montipora",
"MONTI": "montipora",
# https://coralnet.ucsd.edu/label/621/
"Monti_encr": "montipora",
"Monta": "montastraea",
"MONTA": "montastraea",
"Pavon": "pavona",
"Pavona": "pavona",
"PAVON": "pavona",
"Acrop": "acropora",
"ACROP": "acropora",
"Acropora": "acropora",
"Acr_dig": "acropora",
"Acr_tab": "acropora",
"Mille": "millepora",
"MILLE": "millepora",
"Millepora": "millepora",
"Lepta": "leptastrea",
"LEPTA": "leptastrea",
"Lepto": "leptoseris",
"Soft": "soft",
"SOFT": "soft",
"Fung": "fungia",
"FUNG": "fungia",
"Gardin": "gardineroseris",
"GARDIN": "gardineroseris",
"Astreo": "astreopora",
"ASTREO": "astreopora",
"Favia": "favia",
"FAVIA": "favia",
"fav": "favia",
"Lobo": "lobophyllia",
"LOBO": "lobophyllia",
"Psam": "psammocora",
"PSAM": "psammocora",
"Cypha": "cyphastrea",
"CYPHA": "cyphastrea",
"Acan": "acanthastrea",
"ACAN": "acanthastrea",
"GFA": "green_fleshy_algae",
"HS": "hard_substrate",
# P. Rus
# https://coralnet.ucsd.edu/label/88/
"P. Rus": "porites",
# P. Irr
# https://coralnet.ucsd.edu/label/90/
"P. Irr": "porites",
# P mass - Most likely "Porites massiva"
"P mass": "porites",
"Porites": "porites",
"porites": "porites",
"D_coral": "dead_coral",
"Herpo": "herpolitha",
"SC": "soft_coral",
"Platy": "platygyra",
"Echinopora": "echinopora",
"Rock": "rock",
"Stylo": "stylophora",
"Favites": "favites",
"Sando": "sandolitha",
"Tuba": "tuba",
"Dark": "dark",
# https://coralnet.ucsd.edu/label/1636/
"BAD": "bad",
"Goniastrea": "goniastrea",
"C-Rubble": "broken_coral_rubble",
"CCA": "crustose_coralline_algae",
"Algae": "algae",
}
The remapping helps the classification a little bit but not much. Currently, rerun the dataset with the following annotation which should not be a huge class imbalance.
Label distribution:
turf 3833
sand 3715
porites 3060
The result's metrics is similar...
Train the models with the default CIFAR-10 dataset, the metrics of the model are showing good results. This leads me to suspect something wrong with the dataset I'm using.
Added segmentation code in the /src/model.py/data_generator to see whether the segmentation of the image helps. This is because previously, we were using the full image instead of cropped images (blocks) from the full image which represents the actual coral/ label.
Training efficientnetv2 model...
Epoch 1/10
782/782 [==============================] - 232s 266ms/step - loss: 1.5060 - accuracy: 0.4628 - val_loss: 3.5821 - val_accuracy: 0.1374
Epoch 2/10
782/782 [==============================] - 207s 265ms/step - loss: 1.0456 - accuracy: 0.6386 - val_loss: 3.2534 - val_accuracy: 0.1033
Epoch 3/10
782/782 [==============================] - 194s 248ms/step - loss: 0.8972 - accuracy: 0.6915 - val_loss: 4.5692 - val_accuracy: 0.1083
Epoch 4/10
782/782 [==============================] - 199s 255ms/step - loss: 0.8208 - accuracy: 0.7195 - val_loss: 5.8939 - val_accuracy: 0.1397
Epoch 5/10
782/782 [==============================] - 197s 252ms/step - loss: 0.7665 - accuracy: 0.7392 - val_loss: 4.7142 - val_accuracy: 0.1265
Epoch 6/10
782/782 [==============================] - 199s 254ms/step - loss: 0.7165 - accuracy: 0.7539 - val_loss: 3.9307 - val_accuracy: 0.1837
Epoch 7/10
782/782 [==============================] - 195s 250ms/step - loss: 0.6762 - accuracy: 0.7683 - val_loss: 2.9778 - val_accuracy: 0.1316
Epoch 8/10
782/782 [==============================] - 196s 250ms/step - loss: 0.6476 - accuracy: 0.7780 - val_loss: 3.6757 - val_accuracy: 0.1828
Epoch 9/10
782/782 [==============================] - 194s 249ms/step - loss: 0.6183 - accuracy: 0.7897 - val_loss: 4.2203 - val_accuracy: 0.1697
Epoch 10/10
782/782 [==============================] - 201s 257ms/step - loss: 0.5939 - accuracy: 0.7962 - val_loss: 4.5162 - val_accuracy: 0.1081
Models training with annotation file: combined_annotations_about_40k_png_only_remapped_majority_class_with_3k_to_4k_sample.csv
Label Distribution:
From the results, the performance of three different convolutional neural network models, EfficientNetv2, MobileNetv3, and ConvNetXT Tiny, was compared on a multiclass classification problem with 3 classes. Each model was trained using two different batch sizes (16 and 32), but the same number of epochs (5). Despite the varying architectures and training conditions, all models struggled to generalize effectively, evidenced by low validation accuracies and a significant gap between training and validation metrics, indicating severe overfitting. The best performing model was EfficientNetv2, though its performance on the validation set was still poor. Strategies to mitigate overfitting, such as data augmentation, regularization, or collecting more diverse data, should be considered to improve these models' performance. Additionally, verifying the integrity of the training process and the data might be beneficial due to the surprisingly low accuracies reported for MobileNetv3 and ConveNXT Tiny.
{
"efficientnetv2": {
"loss": 1.3885207176208496,
"accuracy": 0.22214336693286896,
"precision": 0.873846173286438,
"recall": 0.8734326958656311,
"auc": 0.9380096793174744,
"true_positives": 7384.0,
"true_negatives": 15842.0,
"false_positives": 1066.0,
"false_negatives": 1070.0,
"val_loss": 9.80955696105957,
"val_accuracy": 0.0037878789007663727,
"val_precision": 0.34706440567970276,
"val_recall": 0.34706440567970276,
"val_auc": 0.5132332444190979,
"val_true_positives": 733.0,
"val_true_negatives": 2845.0,
"val_false_positives": 1379.0,
"val_false_negatives": 1379.0,
"traning_time_in_seconds": 6139.940617322922,
"batch_size": 32,
"epoch": 5,
"annotation_file": "combined_annotations_about_40k_png_only_remapped_majority_class_with_3k_to_4k_sample.csv",
"images_count": 1697,
"annotation_count": 10608,
"annotation_label_count": 3,
"annotation_label_skipped_count": 0
},
"mobilenetv3": {
"loss": 2.847402811050415,
"accuracy": 0.00011828720016637817,
"precision": 0.37581658363342285,
"recall": 0.347054660320282,
"auc": 0.5362700819969177,
"true_positives": 2934.0,
"true_negatives": 12035.0,
"false_positives": 4873.0,
"false_negatives": 5520.0,
"val_loss": 3.0066659450531006,
"val_accuracy": 0.0,
"val_precision": 0.3510688841342926,
"val_recall": 0.3499053120613098,
"val_auc": 0.4971848130226135,
"val_true_positives": 739.0,
"val_true_negatives": 2858.0,
"val_false_positives": 1366.0,
"val_false_negatives": 1373.0,
"traning_time_in_seconds": 5831.569593429565,
"batch_size": 32,
"epoch": 5,
"annotation_file": "combined_annotations_about_40k_png_only_remapped_majority_class_with_3k_to_4k_sample.csv",
"images_count": 1697,
"annotation_count": 10608,
"annotation_label_count": 3,
"annotation_label_skipped_count": 0
}
}
{
"efficientnetv2": {
"loss": 1.1517153978347778,
"accuracy": 0.14596615731716156,
"precision": 0.850940465927124,
"recall": 0.8492326140403748,
"auc": 0.9321764707565308,
"true_positives": 7193.0,
"true_negatives": 15680.0,
"false_positives": 1260.0,
"false_negatives": 1277.0,
"val_loss": 5.162411212921143,
"val_accuracy": 0.004734848625957966,
"val_precision": 0.4364928901195526,
"val_recall": 0.4360795319080353,
"val_auc": 0.6074327826499939,
"val_true_positives": 921.0,
"val_true_negatives": 3035.0,
"val_false_positives": 1189.0,
"val_false_negatives": 1191.0,
"traning_time_in_seconds": 6154.294684886932,
"batch_size": 16,
"epoch": 5,
"annotation_file": "combined_annotations_about_40k_png_only_remapped_majority_class_with_3k_to_4k_sample.csv",
"images_count": 1697,
"annotation_count": 10608,
"annotation_label_count": 3,
"annotation_label_skipped_count": 0
},
"mobilenetv3": {
"loss": 2.1890223026275635,
"accuracy": 0.0,
"precision": 0.4367331266403198,
"recall": 0.40342384576797485,
"auc": 0.5932578444480896,
"true_positives": 3417.0,
"true_negatives": 12533.0,
"false_positives": 4407.0,
"false_negatives": 5053.0,
"val_loss": 2.047524929046631,
"val_accuracy": 0.0,
"val_precision": 0.41845831274986267,
"val_recall": 0.37784090638160706,
"val_auc": 0.5705654621124268,
"val_true_positives": 798.0,
"val_true_negatives": 3115.0,
"val_false_positives": 1109.0,
"val_false_negatives": 1314.0,
"traning_time_in_seconds": 5831.603454589844,
"batch_size": 16,
"epoch": 5,
"annotation_file": "combined_annotations_about_40k_png_only_remapped_majority_class_with_3k_to_4k_sample.csv",
"images_count": 1697,
"annotation_count": 10608,
"annotation_label_count": 3,
"annotation_label_skipped_count": 0
},
"convnexttiny": {
"loss": 2.794490098953247,
"accuracy": 3.9354585169348866e-05,
"precision": 0.3455289900302887,
"recall": 0.31889021396636963,
"auc": 0.5115368962287903,
"true_positives": 2701.0,
"true_negatives": 11824.0,
"false_positives": 5116.0,
"false_negatives": 5769.0,
"val_loss": 1.7662761211395264,
"val_accuracy": 0.0,
"val_precision": 0.3513661324977875,
"val_recall": 0.30445075035095215,
"val_auc": 0.5276635885238647,
"val_true_positives": 643.0,
"val_true_negatives": 3037.0,
"val_false_positives": 1187.0,
"val_false_negatives": 1469.0,
"traning_time_in_seconds": 59739.85233402252,
"batch_size": 16,
"epoch": 5,
"annotation_file": "combined_annotations_about_40k_png_only_remapped_majority_class_with_3k_to_4k_sample.csv",
"images_count": 1697,
"annotation_count": 10608,
"annotation_label_count": 3,
"annotation_label_skipped_count": 0
}
}
From the previous experiment, we see that the performing model is EfficientNet V2. In this experiment, we planned to use the same annotation/ dataset but increase the number of epochs to 10 and 30.
Model 1 (Epochs=5, Batch size=16): The EfficientNetv2 model had a loss of 1.39, an accuracy of 0.22, precision of 0.87, recall of 0.87, and an AUC of 0.94 on the training set. The performance on the validation set was significantly worse, with a loss of 9.81, an accuracy of 0.004, precision of 0.35, recall of 0.35, and an AUC of 0.51. This indicated that the model was overfitting the training data and didn't generalize well to the validation set.
Model 2 (Epochs=10, Batch size=16): This EfficientNetv2 model improved on the training set with a loss of 0.176, an accuracy of 0.15, precision of 0.95, recall of 0.95, and an AUC of 0.99. However, the model's performance on the validation set didn't improve much, indicating overfitting. The validation metrics were loss: 2.32, accuracy: 0.0, precision: 0.39, recall: 0.38, and AUC: 0.56.
Model 3 (Epochs=30, Batch size=16): The latest EfficientNetv2 model had excellent performance on the training set, with a loss of 0.036, an accuracy of 0.13, precision of 0.99, recall of 0.99, and an AUC of 0.999. This model had a better performance on the validation set compared to previous models, but still, the validation accuracy remained practically zero, indicating that the model was still overfitting. The validation metrics were loss: 1.37, accuracy: 0.0, precision: 0.46, recall: 0.43, and AUC: 0.64.
In summary, as the number of epochs increased from 5 to 10 and then to 30, the EfficientNetv2 model's performance on the training set improved significantly. The model's performance on the validation set also improved with the increase in epochs, but the validation accuracy remained practically zero in all cases, indicating that the model was still overfitting the training data. This suggests that the model is not yet able to generalize well to unseen data, and further adjustments are needed, such as regularization techniques or methods to handle potential class imbalance.
Epochs=10
"efficientnetv2": {
"loss": 0.1763043850660324,
"accuracy": 0.15123966336250305,
"precision": 0.948102593421936,
"recall": 0.9468713402748108,
"auc": 0.9906402826309204,
"true_positives": 8020.0,
"true_negatives": 16501.0,
"false_positives": 439.0,
"false_negatives": 450.0,
"val_loss": 2.32393741607666,
"val_accuracy": 0.0,
"val_precision": 0.3858721852302551,
"val_recall": 0.3802083432674408,
"val_auc": 0.5618358850479126,
"val_true_positives": 803.0,
"val_true_negatives": 2946.0,
"val_false_positives": 1278.0,
"val_false_negatives": 1309.0,
"traning_time_in_seconds": 12725.747123003006,
"batch_size": 16,
"epoch": 10,
"annotation_file": "combined_annotations_about_40k_png_only_remapped_majority_class_with_3k_to_4k_sample.csv",
"images_count": 1697,
"annotation_count": 10608,
"annotation_label_count": 3,
"annotation_label_skipped_count": 0
}
}
Epochs=30
{
"efficientnetv2": {
"loss": 0.03613756597042084,
"accuracy": 0.12526564300060272,
"precision": 0.988067090511322,
"recall": 0.9873671531677246,
"auc": 0.999245285987854,
"true_positives": 8363.0,
"true_negatives": 16839.0,
"false_positives": 101.0,
"false_negatives": 107.0,
"val_loss": 1.3676190376281738,
"val_accuracy": 0.0,
"val_precision": 0.45793449878692627,
"val_recall": 0.43039771914482117,
"val_auc": 0.6449277997016907,
"val_true_positives": 909.0,
"val_true_negatives": 3148.0,
"val_false_positives": 1076.0,
"val_false_negatives": 1203.0,
"traning_time_in_seconds": 37084.88212776184,
"batch_size": 16,
"epoch": 30,
"annotation_file": "combined_annotations_about_40k_png_only_remapped_majority_class_with_3k_to_4k_sample.csv",
"images_count": 1697,
"annotation_count": 10608,
"annotation_label_count": 3,
"annotation_label_skipped_count": 0
}
}
The accuracy is able to achieve > 0.2 after commit fb689a0a0fd52c5f1a5cc4724ecf6ede2d64d0fd:
This will be the base model architect as of now and will continue to explore other ways to increase the efficiency and accuracy of the models.
Fixed after replacing accuracy
=> categorical_accuracy
. As accuracy
is for binary class classification.
{
"efficientnetv2": {
"loss": 0.0015080886660143733,
"categorical_accuracy": 0.9995833039283752,
"top_k_categorical_accuracy": 1.0,
"auc": 0.9999999403953552,
"recall_m": 0.9997036457061768,
"precision_m": 0.9995555877685547,
"f1_m": 0.9996291995048523,
"val_loss": 0.8959988355636597,
"val_categorical_accuracy": 0.8576388955116272,
"val_top_k_categorical_accuracy": 1.0,
"val_auc": 0.9398042559623718,
"val_recall_m": 0.856296181678772,
"val_precision_m": 0.8742623925209045,
"val_f1_m": 0.864773690700531,
"traning_time_in_seconds": 67535.4751894474,
"batch_size": 32,
"epoch": 200,
"annotation_file": "combined_annotations_about_40k_png_only_remapped_majority_class_with_3k_to_4k_sample_reduced_1k.csv",
"images_count": 1212,
"annotation_count": 3000,
"annotation_label_count": 3,
"annotation_label_skipped_count": 0
}
}
Problem
Very low model accuracy across different model types. Initially, the model accuracy was decent but in the end, the accuracy drop to 0.1 or 0.0xxx. Suspecting the very low accuracy is due to class imbalance in the dataset.
Annotation file: combined_annotations_about_40k_png_only.csv
Results from different models
EfficientNet
EfficientNet V2
VGG16
MobileNet V3
Custom