This repository contains an enhancement of the YOLOv5 object detection framework to incorporate the calculation of the Area Under the Receiver Operating Characteristic curve (AUROC) per class, which plays a crucial role in evaluating model performance, especially in the domain of medical imaging.
In the realm of medical imaging, the assessment of model predictions is paramount due to the high stakes involved in clinical applications. AUROC is a widely utilized metric in this field as it provides a comprehensive measure to evaluate the discriminative ability of the model across different classes without being affected by the class imbalance. It provides insights into how well the model can distinguish between classes and gives a single score that summarizes the ROC curve, which plots the true positive rate against the false positive rate at various threshold settings.
The following files have been modified to incorporate AUROC calculations and visualizations into the YOLOv5 framework:
metrics.py
val.py
train.py
__init__.py
process_batch(self, detections, labels)
The process_batch
method in the AUROC
class is pivotal in accumulating data required to compute the AUROC for each class. Given below is a simplified walkthrough of how this method processes detection and label data for AUROC computation.
detections
: A batch of detected bounding boxes with shape ([N, 6]), where (N) is the number of detections. Each detection is represented as ([x_1, y_1, x_2, y_2, \text{confidence}, \text{class_id}]).labels
: A batch of ground truth labels with shape ([M, 5]), where (M) is the number of ground truths. Each label is represented as ([\text{class_id}, x_1, y_1, x_2, y_2]).Filtering Detections: Detections are filtered based on a confidence threshold to reduce false positives.
IoU Calculation: The Intersection over Union (IoU) between all ground truth boxes and detected boxes is calculated using the box_iou
method. This step helps to find matches between predicted boxes and ground truth.
Matching Detections with Ground Truth:
Accumulating Predictions and Ground Truth:
self.pred[class_id]
as prediction scores.1
s (for True Positives) or 0
s (for False Positives) are stored in self.true[class_id]
as ground truth labels.After processing all batches through process_batch
, the predictions and ground truth accumulated in self.pred
and self.true
respectively, are utilized to compute the AUROC score per class using the roc_auc_score
method from Scikit-learn, which can be triggered using the out()
method in the AUROC
class.
This calculated AUROC can then be visualized and analyzed to comprehend the performance of the model for different classes, especially vital in imbalanced datasets, where per-class performance evaluation is crucial.
Ensure that the modifications are properly integrated into your YOLOv5 setup. During training or validation, the AUROC per class, as well as relevant curves, should be computed and can be visualized for detailed model performance analysis.
In LOG, will update mAUC value at each validation epoch. Then added a new AUC column at the end of the training.
Epoch GPU_mem box_loss obj_loss cls_loss Instances Size
1/99 4.64G 0.04494 0.06671 0.01729 301 640: 100%|██████████| 8/8 [00:21<00:00, 2.70s/it]
Class Images Instances P R mAP50 mAP50-95 mAUC: 100%|██████████| 4/4 [00:10<00:00, 2.51s/it]
all 128 929 0.709 0.276 0.31 0.206 0.0534
Model summary: 157 layers, 7225885 parameters, 0 gradients, 16.4 GFLOPs
Class Images Instances P R mAP50 mAP50-95 mAUC: 100%|██████████| 4/4 [00:09<00:00, 2.38s/it]
all 128 929 0.679 0.258 0.305 0.201 0.0618
person 128 254 0.847 0.354 0.411 0.254 0.947
bicycle 128 6 0.789 0.631 0.673 0.391 0
car 128 46 0.604 0.283 0.402 0.16 0
motorcycle 128 5 0.488 0.2 0.203 0.183 0
...
...
...
The modified code allows for tracking the AUROC scores for each class on Weights and Biases (WandB) during the training phase. This aids in the real-time monitoring of model performance across different classes. Moreover, upon the completion of training, AUROC curves and radar charts will be generated and saved, providing a visual representation of the model's discriminative ability across classes.
When the training is complete, we will generate a polar_chart figure of AUC and a auroc_curve figure. Taking the vindr CXR dataset as an example, the results of the two figures are:
At the same time, during experiment uses wandb, and the ROC change curve of each class will be update during training.
This project is licensed under the AGPL-3.0 License - see the LICENSE file for details.