ultralytics / yolov5

YOLOv5 ๐Ÿš€ in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
50.43k stars 16.27k forks source link

Calculating TP, FP, FN #12052

Closed ghost closed 11 months ago

ghost commented 1 year ago

Search before asking

Question

Hi, is it possible and how to calculate TP, FP and FN values for each class if I have recall and precision for each class and number of labels for each class in dataset?

Additional

No response

github-actions[bot] commented 1 year ago

๐Ÿ‘‹ Hello @aezakmi99, thank you for your interest in YOLOv5 ๐Ÿš€! Please visit our โญ๏ธ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a ๐Ÿ› Bug Report, please provide a minimum reproducible example to help us debug it.

If this is a custom training โ“ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.

Requirements

Python>=3.8.0 with all requirements.txt installed including PyTorch>=1.8. To get started:

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

YOLOv5 CI

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training, validation, inference, export and benchmarks on macOS, Windows, and Ubuntu every 24 hours and on every commit.

Introducing YOLOv8 ๐Ÿš€

We're excited to announce the launch of our latest state-of-the-art (SOTA) object detection model for 2023 - YOLOv8 ๐Ÿš€!

Designed to be fast, accurate, and easy to use, YOLOv8 is an ideal choice for a wide range of object detection, image segmentation and image classification tasks. With YOLOv8, you'll be able to quickly and accurately detect objects in real-time, streamline your workflows, and achieve new levels of accuracy in your projects.

Check out our YOLOv8 Docs for details and get started with:

pip install ultralytics
glenn-jocher commented 1 year ago

@aezakmi99 yes, it is possible to calculate the True Positives (TP), False Positives (FP), and False Negatives (FN) values for each class if you have the recall and precision for each class, as well as the number of labels for each class in the dataset. Here's how you can do it:

  1. Calculate the Total Positives (TPos) for each class:

    • TPos = recall x number of labels for each class
  2. Calculate the False Positives (FP) for each class:

    • FP = TPos / precision - TPos
  3. Calculate the False Negatives (FN) for each class:

    • FN = number of labels for each class - TPos

By using the recall, precision, and number of labels for each class, you can compute the TP, FP, and FN values. These values are essential for evaluating the performance of object detection models.

I hope this helps! Let me know if you have any further questions.

ghost commented 1 year ago

@glenn-jocher thank you very much. four more questions.

  1. how do I calculate TN?
  2. TP+ FP + FN = number of labels or TP + FP + FN + TN = number of labels?
  3. what if result from second question is greater than number of labels?
  4. also, i get precision and recall as a result from model performance, so I need to calculate number of TP FN FP? Is it right?
glenn-jocher commented 1 year ago

@aezakmi99, you're welcome! I'm glad I could help. Here are the answers to your additional questions:

  1. To calculate True Negatives (TN), you need the total number of negative examples in your dataset and the values of TP, FP, and FN. You can use the formula: TN = Total Negative Examples - FP, where Total Negative Examples is the number of examples that do not belong to any class.

  2. The correct equation is TP + FP + FN = Total Number of Labels. True Negatives (TN) are not included in this equation since they represent examples that are correctly classified as negatives. So, TP + FP + FN = Total Number of Labels.

  3. If the result from the previous equation (TP + FP + FN) is greater than the number of labels in your dataset, it indicates that there may be duplicate or overlapping detections. In such cases, it's important to carefully review your data and the model's performance.

  4. Yes, you can calculate the number of True Positives (TP), False Negatives (FN), and False Positives (FP) if you have the precision and recall values. These values are important indicators of model performance and can help you assess the accuracy and completeness of object detection predictions.

I hope this clarifies your questions. If you have any further doubts or need more assistance, please let me know.

ghost commented 1 year ago

@glenn-jocher oh, thank you very much for the explanation. but what I meant for the 4. question is that, is there any other way to get TP, FN and FP values other than calculating it from the recall and precision?

and one more question, TP, FN, FP must be integers, right? so if I get for example 392.45 for the TP value, I round it to 392? and when I go checking precision for example ( precision = TP / (TP + FN), i don't get the exact same number as i get from the model output. what is the reason for that?

sry if I am bothering you

glenn-jocher commented 1 year ago

@, thank you for your questions! I'm here to help.

Regarding your fourth question, if you have access to the recall and precision values, you can indeed calculate the True Positives (TP), False Negatives (FN), and False Positives (FP) using the formulas provided earlier. This is a common approach to determine these values when evaluating model performance.

However, if you also have the raw outputs or predictions from the model, it is possible to directly extract the TP, FN, and FP values from the predictions themselves. This can be done by comparing the predicted labels with the ground truth labels.

Now, regarding your concern about rounding and precision differences, TP, FN, and FP values are usually treated as integers since they represent discrete counts or quantities. Therefore, it is common practice to round any fractional values to the nearest whole number. So, in your example, rounding 392.45 to 392 would be an appropriate approach.

As for the discrepancy between the precision calculated using the TP value from the model output and the precision obtained through the formula (precision = TP / (TP + FN)), it's important to note that this can happen due to various reasons. Some possible causes include differences in data preprocessing, thresholding, or how the precision is calculated by the model during training. It's normal to observe slight variations, but if you notice significant discrepancies, it would be worth investigating further to ensure accuracy.

You're not bothering me at all! I'm here to assist you with your questions. Please don't hesitate to reach out if you have any more doubts or need further clarification.

ghost commented 1 year ago

@glenn-jocher Thank you again. What do you mean when you say

data preprocessing, thresholding, or how the precision is calculated by the model during training

( I am using yolov7 model at the moment, I know this is v5 github, but on v7 github no one was asnwering me..) Yes, variations are slight, and isn't it because the values โ€‹โ€‹of TP, FP, FN are just being rounded, so they cannot be identical numbers? Does that make sense? Or am I missing something?

and also, if i have one class where is small number of labels, and other class have similar number of labels, what else besides small number of labels could be the reason for poor numbers for that particular class?

glenn-jocher commented 1 year ago

@aezakmi99 hi there,

Regarding your first question, when I mentioned data preprocessing, thresholding, or how precision is calculated during model training, I was referring to potential factors that could introduce variations between the precision calculated using the TP value from the model output and the precision obtained through the formula (precision = TP / (TP + FN)). Data preprocessing involves any transformations or manipulations applied to the data before feeding it into the model, which could impact the precision calculation. Thresholding is a technique used to determine the cutoff for classifying an object as present or not, and different threshold values can affect precision. The precision calculation during training might involve additional considerations specific to the training process, such as loss functions or optimization strategies, which could lead to slight differences.

Regarding your second question, it is plausible that rounding the TP, FP, and FN values could contribute to slight discrepancies between the precision calculated using model output and the precision obtained using the formula. This is because rounding introduces a small degree of imprecision. However, other factors like the specific implementation or interpretation of precision could also contribute to the variations observed.

Finally, if you noticed poor numbers for a particular class with a small number of labels, besides the small number of labels, other potential reasons could include imbalanced training data, difficulties in accurately detecting objects of that class due to their inherent characteristics, or the presence of confounding factors affecting the model's ability to correctly identify objects in that class. Further investigation into these factors might help identify the root causes of the poor numbers.

I hope this answers your questions. Please let me know if you need any further clarification or assistance.

ghost commented 1 year ago

@glenn-jocher thank you. few more questions.

does yolov7 validate after each training epoch? can I get Tp, FP, FN from confusion matrix? and out of that calculate precision and recall?

glenn-jocher commented 1 year ago

@aezakmi99 hello,

Regarding your questions:

  1. YOLOv7 does not inherently perform validation after each training epoch. The validation process and frequency can be customized and implemented separately according to your requirements.

  2. Yes, you can obtain True Positives (TP), False Positives (FP), and False Negatives (FN) from a confusion matrix. A confusion matrix is a table that summarizes the performance of a classification model. It consists of various metrics, including TP, FP, FN, and True Negatives (TN). You can calculate precision and recall using the values from the confusion matrix. Precision is calculated as TP / (TP + FP), and recall is calculated as TP / (TP + FN).

I hope this answers your questions. Let me know if there's anything else I can assist you with.

github-actions[bot] commented 1 year ago

๐Ÿ‘‹ Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.

For additional resources and information, please see the links below:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLO ๐Ÿš€ and Vision AI โญ

MitchRegularStudent commented 1 year ago

I have a question about 'ap_per_class' from utils.metrics.py.

I noticed that the tp, conf, pred_cls, and target_cls only contains information about the performance with regards to the predictions made by the model. So lets say you have one image that has 4 labels of class 1, the models makes two predictions that overlap two of the 4 labels, one of the predictions has class 1 and the other class 2. Now we will have tp, conf, pred_cls, and target_cls of length 2, where one prediction is completely correct and one prediction has selected the wrong class. So the recall will be 0.5 (1 correct prediction / (1 correct prediction + 1 good IoU prediction but bad classification)). However, two of the four missed class labels are completely ignored in the ap_per_class calculation? Is this the expected behavior, and if so, how does it make sense that the metrics do not take any 'real' missed labels into account? Are these missed ground truth labels taken into account when calculating the metrics elsewhere?

glenn-jocher commented 1 year ago

@MitchRegularStudent the 'ap_per_class' calculation in utils.metrics.py focuses on evaluating the precision and recall of the predictions made by the model. It does not consider the missed ground truth labels in the AP calculation.

In your example, if there are 4 labels of class 1 in an image and the model makes two predictions, where one prediction is correct and the other has a wrong class, the recall will indeed be 0.5. However, only the correct prediction will contribute to the AP calculation for class 1.

The AP (Average Precision) metric is primarily concerned with assessing the performance of the model's predictions and how well they match the ground truth. Therefore, missed ground truth labels are not directly factored into the AP calculation. However, it's worth noting that AP does consider false positives (incorrect predictions) and their impact on precision.

If you are interested in evaluating the overall performance of the model, including the missed ground truth labels, you may need to consider additional metrics or modify the code accordingly.

Let me know if you have any further questions or require any more clarification.

MitchRegularStudent commented 1 year ago

Hi Glenn, thanks a lot for the very quick response!

I did have a follow up question, I'm glad to hear that ap_per_class works as I understood it from the code base. However, I did notice a bit of a discrepancy. When I train a yolov8 detection model, at the end of the training cycles (i.e. after training for the specified number of epochs), the performance of the model gets output into the terminal with the following headers 'Class Images Instances Box(P R mAP50 mAP50-95):' and it shows the per class metrics for these columns (and the average of the per class metrics). The strange part is that when I look at the number of 'instances' that are output, at the per class level, they align with the number of labels that are in the dataset and not with the number of predictions made by the model. This strikes me as a bit confusing as I would expect that a metric that solely focuses on the performance of the predictions of the model would output the number of predictions made by the model for a class. Does the per class metric calculation at the end of training differ from the model.val() that calls ap_per_class?

glenn-jocher commented 1 year ago

@MitchRegularStudent hi there,

Thank you for bringing up this discrepancy. I understand your confusion regarding the inconsistency between the number of instances shown in the per-class metrics during training and the number of predictions made by the model for a class.

During training, when the per-class metrics are displayed with the headers 'Class Images Instances Box(P R mAP50 mAP50-95)', the 'Instances' column actually refers to the number of ground truth labels present in the dataset for each class. This column provides information about the actual instances/labels present in the dataset and is not directly related to the predictions made by the model.

On the other hand, when using the ap_per_class function or the model.val() method that calls it, the calculation focuses on evaluating the performance of the model's predictions, including precision, recall, and AP per class.

Therefore, the two calculations serve slightly different purposes. The per-class metrics during training provide insights into the distribution and occurrence of ground truth labels in the dataset, while the ap_per_class function and model.val() method assess the precision and recall achieved by the model's predictions.

I hope this clarifies the difference between the per-class metrics during training and the ap_per_class calculations. If you have any further questions or concerns, please let me know.

github-actions[bot] commented 11 months ago

๐Ÿ‘‹ Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.

For additional resources and information, please see the links below:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLO ๐Ÿš€ and Vision AI โญ