questions about improving mAP?

xingguang12 commented 10 months ago

Search before asking

[X] I have searched the YOLOv8 issues and discussions and found no similar questions.

Question

When training my own dataset with YOLOv8, I encountered some issues and would like to seek your advice. My dataset has extremely imbalanced class distribution, with some classes having fewer than 10 instances, while others have a few thousand instances. Currently, the classes with fewer instances yield poor training results, and I am considering using offline data augmentation to augment my dataset (as it's challenging to find more images for those underrepresented classes online). However, I have a few questions:

How many instances should be augmented for classes with fewer than 10 instances? What is a suitable number of instances for the model to train well?
Should I split the dataset before or after data augmentation? What is the impact of having only one or two instances for some classes in the test set (on mAP evaluation)? How does augmenting the dataset affect its distribution?
Do you have any experience with augmenting datasets? What are your recommendations for handling extremely imbalanced class distributions?

Additional

No response

github-actions[bot] commented 10 months ago

👋 Hello @xingguang12, thank you for your interest in Ultralytics YOLOv8 🚀! We recommend a visit to the Docs for new users where you can find many Python and CLI usage examples and where many of the most common questions may already be answered.

If this is a 🐛 Bug Report, please provide a minimum reproducible example to help us debug it.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.

Join the vibrant Ultralytics Discord 🎧 community for real-time conversations and collaborations. This platform offers a perfect space to inquire, showcase your work, and connect with fellow Ultralytics users.

Install

Pip install the ultralytics package including all requirements in a Python>=3.8 environment with PyTorch>=1.8.

pip install ultralytics

Environments

YOLOv8 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Notebooks with free GPU:
Google Cloud Deep Learning VM. See GCP Quickstart Guide
Amazon Deep Learning AMI. See AWS Quickstart Guide
Docker Image. See Docker Quickstart Guide

Status

If this badge is green, all Ultralytics CI tests are currently passing. CI tests verify correct operation of all YOLOv8 Modes and Tasks on macOS, Windows, and Ubuntu every 24 hours and on every commit.

glenn-jocher commented 10 months ago

@xingguang12 hello! Addressing class imbalance is indeed crucial for improving mAP on your dataset. Here are some concise tips:

Aim for a minimum of a few hundred instances per class after augmentation to give the model a fair chance to learn. There's no strict rule, but more variety generally helps.
Split your dataset before augmentation to maintain a valid representation of the original distribution in your validation set. Having very few instances in the test set can lead to high variance in mAP for those classes, so try to keep a reasonable number in the test set as well.
Data augmentation is a common strategy for dealing with class imbalance. Consider using a mix of geometric and photometric augmentations to introduce diversity. Also, explore techniques like weighted loss functions to give more importance to underrepresented classes during training.

For more detailed guidance on these topics, you might find our documentation on Train and Val modes helpful. Good luck with your training! 🚀

xingguang12 commented 10 months ago

@xingguang12 hello! Addressing class imbalance is indeed crucial for improving mAP on your dataset. Here are some concise tips:

Aim for a minimum of a few hundred instances per class after augmentation to give the model a fair chance to learn. There's no strict rule, but more variety generally helps.

Split your dataset before augmentation to maintain a valid representation of the original distribution in your validation set. Having very few instances in the test set can lead to high variance in mAP for those classes, so try to keep a reasonable number in the test set as well.

Data augmentation is a common strategy for dealing with class imbalance. Consider using a mix of geometric and photometric augmentations to introduce diversity. Also, explore techniques like weighted loss functions to give more importance to underrepresented classes during training.

For more detailed guidance on these topics, you might find our documentation on Train and Val modes helpful. Good luck with your training! 🚀

I appreciate your thanks! Now, let's address your questions regarding data augmentation in the context of your imbalanced dataset:

Background: My dataset has extremely imbalanced class distribution. I plan to split the dataset (train:val:test = 7:2:1, totaling 10,000 images) and then perform data augmentation.

For classes with very few instances, should I directly delete them? For example, if class "p" has 30 instances, and after splitting the dataset, the validation set has only 6 instances, and the test set has only 3 instances of class "p," evaluating the training effectiveness for class "p" might be subject to significant randomness. For classes with fewer than 10 instances, there might be only one or zero images containing that class in the test set. So, should I delete classes with fewer instances (I tentatively set the threshold at 70) and focus on training classes with relatively more instances?
Following the principle of splitting the dataset before augmentation, should I augment classes with fewer instances in the training set (ensuring at least 100 instances per class)? Is it unnecessary to balance all classes to a similar quantity, as long as the instance count for each class in the training set is not less than 100, facilitating model training? My dataset's distribution is shown in the graph, where most classes have few instances.
Common data augmentation techniques include random flipping, salt-and-pepper noise, brightness enhancement, color space distortion, and their combinations. Do you have any recommended data augmentation methods and specific parameter values?
For classes with an awkward number of instances, such as 40 instances, can I randomly allocate 30 instances of this class to the validation and test sets in a 2:1 ratio (to avoid large mAP variance for this class)? Then, can I augment the remaining 10 instances to reach a total of 100 instances? I understand that this approach may disrupt the original distribution of the test and validation sets, but it's done to better detect this class. Is this method feasible?
For certain special classes, can I use the replacement method (i.e., replacing the class in the image with icons from other classes, as shown in the image below) to augment instances of these classes? For these classes, can I place all instances in the test and validation sets, while instances in the training set are entirely composed of replacements from images of other classes?
Some classes exhibit strong similarities, as shown in the image below. Does labeling these classes collectively as one category have any impact on training? I'm especially looking forward to your response. Thank you very much in advance.

glenn-jocher commented 10 months ago

Hello @xingguang12, let's tackle your concerns:

Rather than deleting small classes, augment them to reach at least 100 instances if possible. This helps the model learn these classes without discarding potentially valuable data.
Yes, augment classes in the training set to ensure a minimum of 100 instances per class. Perfect balance isn't necessary, but reducing the imbalance helps.
The augmentation techniques you mentioned are good. Experiment with parameters to find the best results for your dataset; there's no one-size-fits-all.
It's better to maintain the original distribution for validation and test sets. Augment the 40 instances for training, but keep the validation and test sets representative.
Using replacement augmentation can be useful, but ensure it reflects real-world scenarios. All instances in training as replacements might not be ideal; include some originals if possible.
Grouping similar classes can simplify the problem and improve performance if the distinctions aren't critical for your application.

Remember, the key is to maintain the integrity of your validation and test sets while using augmentation to improve the training set. Good luck! 🌟

github-actions[bot] commented 9 months ago

👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.

For additional resources and information, please see the links below:

Docs: https://docs.ultralytics.com
HUB: https://hub.ultralytics.com
Community: https://community.ultralytics.com

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLO 🚀 and Vision AI ⭐

ultralytics / ultralytics