A Problem Concerning the Custom Dataset for Object Detection Using YOLOv5

ultralytics / yolov5

YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite

https://docs.ultralytics.com

GNU Affero General Public License v3.0

50.02k stars 16.17k forks source link

A Problem Concerning the Custom Dataset for Object Detection Using YOLOv5 #13113

Closed MatthewCarryOn closed 1 month ago

MatthewCarryOn commented 3 months ago

Search before asking

[X] I have searched the YOLOv5 issues and discussions and found no similar questions.

Question

Hi everyone and @glenn-jocher. I am a beginner at using YOLOv5. My task is to detect objects from 4 classes of garbage (recyclable, harmful, kitchen, and other), which include various specific items in each class (e.g., recyclable: tin cans, plastic bottles, beer bottles, etc.), on a self-built platform equipped with a camera connected to a Raspberry Pi 4B that has YOLOv5 deployed on it. The camera’s position is fixed. Initially, when creating my own dataset, I simply collected pictures of garbage from each class on my platform and labeled them (I collected about 300 pictures for each class). Later on, I realized that the garbage I collected and pictured is far less than required (e.g., I only collected 3 specific types of plastic bottles and pictured various of their postures on the platform and when some new type of plastic bottles is placed on the platform, perheps the trained model doesn't work well in detect them). So I’m planning to collect more data from different backgrounds to improve robustness. I’m really curious about that: I'm really curious about that : 1) Should I do so to diversify the data for my task ? 2) From 1), if I should, what should be the proportion between pictures collected from my platform and those collected from other diverse backgrounds ? 3) Will different background affect the accuracy of detection on the platform whose background is fixed ? Broadly, How is yolov5 deal with background and the target object(s) seperately, decerning targets from the background ? Ps. This is the first time I’ve asked questions in an ‘issue’, please forgive any ignorance. I would be more than delighted if anyone could provide me with effective ideas or referencing materials about my perplexity. Thanks. Best Wishes.

Additional

No thanks.

github-actions[bot] commented 3 months ago

👋 Hello @MatthewCarryOn, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide a minimum reproducible example to help us debug it.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.

Requirements

Python>=3.8.0 with all requirements.txt installed including PyTorch>=1.8. To get started:

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Notebooks with free GPU:
Google Cloud Deep Learning VM. See GCP Quickstart Guide
Amazon Deep Learning AMI. See AWS Quickstart Guide
Docker Image. See Docker Quickstart Guide

Status

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training, validation, inference, export and benchmarks on macOS, Windows, and Ubuntu every 24 hours and on every commit.

Introducing YOLOv8 🚀

We're excited to announce the launch of our latest state-of-the-art (SOTA) object detection model for 2023 - YOLOv8 🚀!

Designed to be fast, accurate, and easy to use, YOLOv8 is an ideal choice for a wide range of object detection, image segmentation and image classification tasks. With YOLOv8, you'll be able to quickly and accurately detect objects in real-time, streamline your workflows, and achieve new levels of accuracy in your projects.

Check out our YOLOv8 Docs for details and get started with:

pip install ultralytics

glenn-jocher commented 3 months ago

Hello @MatthewCarryOn,

Welcome to the YOLOv5 community! It's great to see your interest in applying YOLOv5 to your custom object detection task. Let's address your questions one by one:

Diversifying Your Dataset: Yes, diversifying your dataset is a good idea. A more varied dataset helps the model generalize better to new, unseen instances of objects. This is especially important for your task, where you want to detect various types of garbage items that may not have been included in your initial dataset.
Proportion of Platform vs. Diverse Backgrounds: While there isn't a strict rule for the proportion, a balanced approach is generally beneficial. Since your camera's position is fixed, ensure that a significant portion of your dataset (perhaps 60-70%) is collected from your platform to maintain context consistency. The remaining 30-40% can be from diverse backgrounds to improve robustness. This way, the model learns to recognize objects in your specific setup while also being adaptable to variations.
Impact of Different Backgrounds: Different backgrounds can indeed affect detection accuracy, especially if the model hasn't seen similar backgrounds during training. YOLOv5, like other object detection models, uses convolutional neural networks (CNNs) that learn to differentiate objects from their backgrounds based on features. By training on varied backgrounds, you help the model learn to focus on the objects themselves rather than the background context.

For best practices, ensure that your labels are consistent and accurate, and consider adding some background images (images with no objects) to reduce false positives. You can refer to our Tips for Best Training Results for more detailed guidance.

Here's a quick example of how you might start training with your custom dataset:

python train.py --data custom.yaml --weights yolov5s.pt --epochs 300 --img 640

This command uses the small YOLOv5 model (yolov5s.pt) and trains for 300 epochs with an image size of 640x640. Adjust these parameters based on your specific needs and hardware capabilities.

Feel free to share any training results, plots, or additional questions you have. The community and the Ultralytics team are here to help!

Best of luck with your project! 🍀

MatthewCarryOn commented 3 months ago

Hello @MatthewCarryOn,

Welcome to the YOLOv5 community! It's great to see your interest in applying YOLOv5 to your custom object detection task. Let's address your questions one by one:

Diversifying Your Dataset: Yes, diversifying your dataset is a good idea. A more varied dataset helps the model generalize better to new, unseen instances of objects. This is especially important for your task, where you want to detect various types of garbage items that may not have been included in your initial dataset.

Proportion of Platform vs. Diverse Backgrounds: While there isn't a strict rule for the proportion, a balanced approach is generally beneficial. Since your camera's position is fixed, ensure that a significant portion of your dataset (perhaps 60-70%) is collected from your platform to maintain context consistency. The remaining 30-40% can be from diverse backgrounds to improve robustness. This way, the model learns to recognize objects in your specific setup while also being adaptable to variations.

Impact of Different Backgrounds: Different backgrounds can indeed affect detection accuracy, especially if the model hasn't seen similar backgrounds during training. YOLOv5, like other object detection models, uses convolutional neural networks (CNNs) that learn to differentiate objects from their backgrounds based on features. By training on varied backgrounds, you help the model learn to focus on the objects themselves rather than the background context.

For best practices, ensure that your labels are consistent and accurate, and consider adding some background images (images with no objects) to reduce false positives. You can refer to our Tips for Best Training Results for more detailed guidance.

Here's a quick example of how you might start training with your custom dataset:
python train.py --data custom.yaml --weights yolov5s.pt --epochs 300 --img 640
This command uses the small YOLOv5 model (yolov5s.pt) and trains for 300 epochs with an image size of 640x640. Adjust these parameters based on your specific needs and hardware capabilities.

Feel free to share any training results, plots, or additional questions you have. The community and the Ultralytics team are here to help!

Best of luck with your project! 🍀

Thank you for your quick, concise and effective explanation! It's my great honor and really helps a lot for a beginner.

glenn-jocher commented 3 months ago

Hello @MatthewCarryOn,

Thank you for your kind words! We're thrilled to hear that the information was helpful to you. The credit goes to the amazing YOLO community and the dedicated Ultralytics team who continuously strive to make these tools accessible and effective for everyone.

If you encounter any further questions or need additional assistance as you progress with your project, please don't hesitate to reach out. Sharing your training results, plots, or any specific issues you face can help us provide more targeted support.

Remember, the Tips for Best Training Results guide is a valuable resource as you refine your dataset and training process. It covers a wide range of best practices that can significantly enhance your model's performance.

Best of luck with your object detection task, and happy training! 🚀

Warm

MatthewCarryOn commented 3 months ago

Hello @MatthewCarryOn,

Thank you for your kind words! We're thrilled to hear that the information was helpful to you. The credit goes to the amazing YOLO community and the dedicated Ultralytics team who continuously strive to make these tools accessible and effective for everyone.

If you encounter any further questions or need additional assistance as you progress with your project, please don't hesitate to reach out. Sharing your training results, plots, or any specific issues you face can help us provide more targeted support.

Remember, the Tips for Best Training Results guide is a valuable resource as you refine your dataset and training process. It covers a wide range of best practices that can significantly enhance your model's performance.

Best of luck with your object detection task, and happy training! 🚀

Warm

Hi @glenn-jocher and everyone,

I built a custom dataset composed of:

40% original pictures taken from a fixed background,
10% randomly selected original pictures with data augmentation (HSI transforms, rotation, flipping, adding Gaussian noise, and motion blur),
45% diverse pictures with objects of the same type but from different backgrounds (to increase the robustness of the detection model given the limited sources of collected garbage, this part outweighs the first part),
5% pure background images without any annotations.

I'm a bit confused about how these background images will impact the CNNs to improve my model's performance, beyond just mathematically reducing false positives (FPs). From my superficial perspective, it "reminds" the network of its original background through detecting nothing, but I cannot fully understand why. Could someone explain the mechanisms or benefits involved intuitively?

Furthermore, I'm fascinated by the relationship between target objects and their correlation with the background. My current thought is that because of multi-dimensional feature extraction(backbone) and merging(neck), the CNNs learn the features of the target objects themselves as well as their correlative features with the background(which should be irrelative to the objects). This seems paradoxical and quite perplexing to me.

Any further suggestions for the construction of my dataset are welcome! I would be more than delighted if anyone could provide me with effective ideas or reference materials about my perplexity. I'm working hard to better the foundation of the task, which is the dataset.

Best regards, Matthew

glenn-jocher commented 3 months ago

Hi @MatthewCarryOn,

Thank you for sharing the detailed composition of your custom dataset and for your insightful questions! It's fantastic to see your dedication to understanding and improving your model's performance.

Impact of Background Images

Including pure background images (images without any objects) in your dataset can indeed help reduce false positives. Here's an intuitive explanation of the mechanism:

Negative Samples: Background images act as negative samples, teaching the model what "not" to detect. By seeing images where no objects are present, the model learns to distinguish between actual objects and background noise.
Regularization: These images help regularize the model by providing a broader context, ensuring it doesn't overfit to specific patterns that might be present in your object-containing images.
Improved Generalization: By including diverse backgrounds, the model learns to focus on the objects themselves rather than being influenced by the background context. This helps in better generalization to new, unseen backgrounds.

Feature Extraction and Background Correlation

Your understanding of the multi-dimensional feature extraction and merging in CNNs is on point. Here's a bit more detail:

Backbone (Feature Extraction): The backbone of YOLOv5 (e.g., CSP-Darknet53) extracts hierarchical features from the input images. These features include both object-specific details and contextual information from the background.
Neck (Feature Aggregation): The neck (e.g., CSP-PAN, SPPF) aggregates these features at different scales, enhancing the model's ability to detect objects of various sizes and in different contexts.
Head (Prediction): The head (YOLOv3 Head) uses these aggregated features to make final predictions, focusing on the object-specific features while minimizing the influence of irrelevant background features.

The key is that while the model does learn some correlation between objects and their backgrounds, the training process (especially with diverse and augmented data) helps it prioritize object-specific features over background noise.

Suggestions for Dataset Construction

Your dataset composition looks well thought out. Here are a few additional suggestions:

Balanced Augmentation: Ensure that your data augmentation techniques are balanced and do not introduce artifacts that could confuse the model. Techniques like Mosaic and Copy-Paste augmentations can also be beneficial.
Class Balance: Verify that each class of garbage is well-represented in your dataset to avoid class imbalance issues.
Validation Set: Maintain a separate validation set that mirrors the diversity of your training set. This helps in evaluating the model's performance more accurately.

For more detailed insights, you can refer to the YOLOv5 Architecture Description, which provides an in-depth look at the model's components and their functions.

Feel free to share any further results or questions you have. The community and the Ultralytics team are here to support you!

Best regards and happy training! 🚀

Warm

github-actions[bot] commented 2 months ago

👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.

For additional resources and information, please see the links below:

Docs: https://docs.ultralytics.com
HUB: https://hub.ultralytics.com
Community: https://community.ultralytics.com

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLO 🚀 and Vision AI ⭐