Adding Background Image and Increasing conf_thres does not lower FP

ultralytics / yolov5

YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite

https://docs.ultralytics.com

GNU Affero General Public License v3.0

50.81k stars 16.37k forks source link

Adding Background Image and Increasing conf_thres does not lower FP #12885

Closed Jecia888 closed 5 months ago

Jecia888 commented 7 months ago

Search before asking

[X] I have searched the YOLOv5 issues and discussions and found no similar questions.

Question

Hi, I experimented adding approx. 20%,10%, and 5% of background images into the training dataset. However, the FP before and after I added them remain unchanged. From the val_batches I could see that all those background images were not detected as objects. However, the confusion matrix still displays that there are no background images detected as background. I tried adding the background images with and without their label files, but I don't think it produces a change either. I also increased the confidence threshold to 80%, but the result is still the same. I am confused. Am I understanding the background class incorrectly? Will background images ever be detected as background in the confusion matrix?

Thanks! Really appreciate any insights on this issue!

Additional

No response

github-actions[bot] commented 7 months ago

👋 Hello @Jecia888, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide a minimum reproducible example to help us debug it.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.

Requirements

Python>=3.8.0 with all requirements.txt installed including PyTorch>=1.8. To get started:

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Notebooks with free GPU:
Google Cloud Deep Learning VM. See GCP Quickstart Guide
Amazon Deep Learning AMI. See AWS Quickstart Guide
Docker Image. See Docker Quickstart Guide

Status

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training, validation, inference, export and benchmarks on macOS, Windows, and Ubuntu every 24 hours and on every commit.

Introducing YOLOv8 🚀

We're excited to announce the launch of our latest state-of-the-art (SOTA) object detection model for 2023 - YOLOv8 🚀!

Designed to be fast, accurate, and easy to use, YOLOv8 is an ideal choice for a wide range of object detection, image segmentation and image classification tasks. With YOLOv8, you'll be able to quickly and accurately detect objects in real-time, streamline your workflows, and achieve new levels of accuracy in your projects.

Check out our YOLOv8 Docs for details and get started with:

pip install ultralytics

glenn-jocher commented 7 months ago

@Jecia888 hi there! 😊 Great job experimenting with your training setup and thanks for reaching out!

Indeed, adding background images (those without any objects of interest) to your dataset is a fine strategy to reduce false positives (FP), but the effectiveness can vary based on your specific dataset and the variety within those background images. Also, remember that the YOLO architecture does not explicitly detect "background" as a class. Instead, it reduces confidence in areas not recognized as any of the trained classes.

Regarding your attempt and observations:

The lack of change in FP rate even after increasing conf_thres and adding background images might suggest that your model is quite confident about its predictions, even if they are false positives. In this case, consider additional strategies like augmenting your dataset more diversely, reviewing your dataset for any labeling errors, or even adjusting other hyperparameters.
Increasing the conf_thres should ideally reduce FPs but will also reduce true positives (TP), so it's a balance.
Ensure that your background images are indeed representative of scenarios where objects of interest are not present but under similar conditions (lighting, scale, etc.) as your positive samples.

Since the confusion matrix doesn't explicitly show "background" detections, you're correct in noting that it won't appear as such. The matrix focuses on the detection performance regarding the actual object classes you've trained the model on.

For next steps, I'd suggest further dataset examination and perhaps even more extensive data augmentation. Also, revisiting the training process with adjusted hyperparameters might uncover new insights. For more detailed guidance, the Ultralytics Docs is a great resource!

Keep pushing, and thanks for your contributions to the YOLO community! Your experiments help improve understanding and performance for everyone. 😄

Jecia888 commented 7 months ago

@glenn-jocher Hi, thank you for your response. Yesterday, I modified the data set again and the precision increased by 30% (recall is 99%, precision is 98%, mAP 0.5 is 99%, mAP0.05:0.95 is 75% by the end of the training). However, the background issue is still there (but instead of distributing evenly in 4 classes, now it is mainly in 1 class). I got to see the batches with the background issues and I made a small sketch on it.

IMG_59FCC79F076B-1

For context, I am trying to detect some parts of a machine. All those three classes are in an overlapping region. For each image, the code will generate a label file with labels from all three classes. I see that the background are all detected somewhere around the class 1 region. From the new idea on background I got from reading your response, I thought the model will ignore the part of image that has already been detected. Or can I set the model to ignore that area?

I am also confused how should a "perfect" confusion matrix look like. If, let's say, there are no FP, will the entire background class be condensed to the right-most box in the diagonal (background be detected as background). Or, will the box in the background column disappear entirely? May you please also give me some insights based on the recall, precision, and mAP of my training session?

Thanks!

glenn-jocher commented 7 months ago

Hi @Jecia888! 😊 I'm thrilled to hear about the significant improvement in your metrics — outstanding work!

When you're dealing with overlapping regions and specific parts detection in such a close context (e.g., machine parts), precision is key. Your scenario sounds challenging but also quite fascinating. Since YOLO doesn't inherently understand or "ignore" already detected parts of an image in the way you're envisioning, what's likely happening is that the model is trying its best to interpret the complex visual signals from those overlapping areas.

Regarding confusing areas being majorly misclassified into one class now, this could be a sign of class imbalance in your dataset or perhaps the model learning specific features tied to that class more dominantly. You might benefit from further examining those misclassified instances to understand what features are misleading the model.

For the confusion matrix and understanding perfect scenarios:

Ideally, in a "perfect" confusion matrix, non-background (object) classes would only populate the diagonal, indicating 100% precision and recall for each class. In practice, YOLO doesn't treat background as a class to be detected, so we don't expect to see a dedicated background row/column in the matrix. Rather, the goal is to minimize any non-diagonal entries which represent misclassifications.
Based on your metrics (recall of 99%, precision of 98%, and high mAP scores), your model performs exceedingly well! The mAP0.05:0.95 at 75% indicates a strong ability across various IoU (Intersection over Union) thresholds, though there might be room for improvement, particularly at stricter IoU levels.

To tackle the focus issue around class 1, you might consider several approaches:

Augmenting your data further specifically for those tricky overlapping parts, ensuring a balanced representation.
Reviewing class 1 samples more closely to see if there's any bias or typical features that could be misleading the model and adjusting your dataset accordingly.
Advanced techniques, such as designing a multi-stage detection process where initial detections refine the areas of interest for subsequent detections. This is more complex but could offer a tailored solution.

Your success so far suggests you're on the right path. Keep iterating and exploring the nuances of your particular challenge. Every dataset, especially in specialized domains like yours, brings unique learning opportunities. Keep up the great work! 😄

Jecia888 commented 7 months ago

Hi @glenn-jocher, I was experimenting with different dataset in the past few days. I found out the cause of the background issue but I have no idea on how to fix it.

IMG_B5C61FBD8907-1 =

As you could see from the sketch, the top is my label and the bottom is the prediction that the model makes. Class 3 looks very different from class 2, but the model is keep making the prediction that tends to expand class 3 to the region of class 2. Sometime class 3 even covers the entirety of class 2 (and the two classes confuses each others). It is worth noting that, class 3 is always detected correctly, but the model is keep making additional incorrect predictions that follow this pattern.

I think my labels should have not caused this issue because when I only have class 1 and class 3, the prediction does not showcase this tendency at all.

In addition, the order of them in the label file also make a difference. If I have class 3, then class 2 and class 1, this tendency would be severed. And very interestingly, even in the worst case scenario where class 3 and class 2 really confuses with each others, the precision, recall, mAP remained 98%+.

May you please give me any insights on this issue? Really appreciate it!

glenn-jocher commented 7 months ago

Hi @Jecia888! 😊 It sounds like you've made some intriguing observations with your experiments. The issue with class 3 expanding into the class 2 region and causing confusion, yet maintaining high precision and recall, is fascinating and suggests a few potential areas to explore.

Given your description, here are a couple suggestions:

Class Similarity: If class 3 is consistently misclassified as class 2 (or vice versa), it might be worthwhile to closely examine the features these classes share. The model might be picking up on similarities that aren't immediately obvious.
Label Order in Training: The impact of label order is intriguing and could be related to how the model is learning the spatial relationships between classes. You might consider randomizing the order of classes in your training data to see if this reduces the tendency of class expansion.
Augmentation and Dataset Diversity: Increasing the diversity of your dataset with more varied examples of class 2 and class 3 instances, especially where they are close to each other but clearly distinguishable, might help the model learn more precise boundaries.
Post-Processing: Implementing a post-processing step where you apply additional criteria to refine detections might also help. For example, you could use known size ratios or spatial relationships between the classes to correct misclassifications.

It's also worth noting that high metrics like precision and recall, while impressive, might not fully reflect the confusion between these two classes if it's a specific edge case the metrics are not sensitive to.

I hope these suggestions provide a starting point for further experiments. Your detailed observations are incredibly valuable, and it’s clear you’re on the right track. Keep exploring, and I'm confident you'll unlock more insights into this behavior. 😄

Jecia888 commented 6 months ago

Hi @glenn-jocher, I think I am almost there in terms of solving the problem. I just realized an interesting scenario. When I train with only 10 epochs, the diagonal of the confusion matrix is perfect and no classes are confused with others. However, when I train with 50 epochs, accuracy of one class decreased by around 10 percent. I have a dataset consisting of 2300 images. Can I just use the 10 epochs training model for inference? Or do I have to furthur investigate into the issue until the model has consistent preference with larger epochs (assuming that results from lower epochs might not have given an accurate evaluation of the performance)?

For reference, this is the results.png from the 10 epochs session.

This is the results.png from the 50 epochs session.

glenn-jocher commented 6 months ago

Hi there! 😊 It's fantastic to hear about your progress and interesting observations.

Training with only 10 epochs and observing perfect class distinction is great, yet it's curious that extending to 50 epochs decreases the accuracy for a class. This scenario suggests that your model could be experiencing overfitting, where it learns the training data too well, including its noise and outliers, leading to a decrease in performance on validation or unseen data.

Using the 10-epoch model for inference is completely valid if it meets your performance requirements. It's always the results that dictate the best course of action, not the number of epochs. However, it would be beneficial to keep a few things in mind:

Ensure your evaluation is thorough, including testing the model on a diverse and representative dataset to confirm its robustness.
Consider implementing early stopping in future training sessions. This technique automatically stops training when the validation loss starts to increase, preventing overfitting.
Experiment with data augmentation and other regularization techniques if you haven't already. These can improve the generalization of your model.

Ultimately, the decision to use the 10-epoch model hinges on its performance in real-world scenarios aligned with your goals. If it performs well, there's no immediate need to pursue further training until you identify a new need for improvement. Keep iterative experimentation in mind as sometimes what works best defies initial expectations.

Great job, and keep up the good work! 😄

github-actions[bot] commented 5 months ago

👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.

For additional resources and information, please see the links below:

Docs: https://docs.ultralytics.com
HUB: https://hub.ultralytics.com
Community: https://community.ultralytics.com

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLO 🚀 and Vision AI ⭐