ultralytics / yolov5

YOLOv5 πŸš€ in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
50.38k stars 16.26k forks source link

Use only 12 clases of COCO and add extra ones. #12720

Closed PandaPandula closed 7 months ago

PandaPandula commented 8 months ago

Search before asking

Question

I only need 12 classes (including person, suitcases, backpacks, etc.) out of the 80 from COCO, and I need to add 3 extra classes that have been collected from Roboflow (approximately 10k images of each class after using data augmentation, including rotating and cropping). The problem is that when I train the model, it's not able to detect those 3 classes compared to the ones from COCO. I've trained a model only with the 3 extra classes, and it can detect what's needed. I've tried using the flags --image-weights --multi-scale to combat the imbalance since the class person has +200k appearances. Any idea what's happening? Is there another way to merge two models that work well into one?

Additional

The extra 3 clases are also labeled with the 12 classes needed from COCO. For example, If in any of the 3 extra classes photos there is a person or backpack, it is labeled correctly.

github-actions[bot] commented 8 months ago

πŸ‘‹ Hello @PandaPandula, thank you for your interest in YOLOv5 πŸš€! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a πŸ› Bug Report, please provide a minimum reproducible example to help us debug it.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.

Requirements

Python>=3.8.0 with all requirements.txt installed including PyTorch>=1.8. To get started:

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

YOLOv5 CI

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training, validation, inference, export and benchmarks on macOS, Windows, and Ubuntu every 24 hours and on every commit.

Introducing YOLOv8 πŸš€

We're excited to announce the launch of our latest state-of-the-art (SOTA) object detection model for 2023 - YOLOv8 πŸš€!

Designed to be fast, accurate, and easy to use, YOLOv8 is an ideal choice for a wide range of object detection, image segmentation and image classification tasks. With YOLOv8, you'll be able to quickly and accurately detect objects in real-time, streamline your workflows, and achieve new levels of accuracy in your projects.

Check out our YOLOv8 Docs for details and get started with:

pip install ultralytics
glenn-jocher commented 8 months ago

@PandaPandula hello! It sounds like you're dealing with a class imbalance issue, which is common when mixing datasets of different sizes. Here are a few suggestions:

  1. Class Weights: Adjust the class weights during training to give more importance to the underrepresented classes. This can help the model learn from the less frequent classes.

  2. Data Sampling: Use a weighted random sampler to ensure that each batch of data has a balanced number of samples from each class during training.

  3. Focal Loss: Consider using a focal loss function, which is designed to address class imbalance by down-weighting well-classified examples and focusing on hard-to-classify ones.

  4. Fine-tuning: Train your model first on COCO for the 12 classes, then fine-tune on your dataset with the 3 additional classes. This can help the model generalize better on the COCO classes before adapting to the new ones.

  5. Augmentation: Continue using strong data augmentation, especially on the 3 new classes, to increase their effective sample size.

  6. Combine Datasets: Ensure that your training dataset is a good mix of COCO and your new classes. You might need to oversample your new classes to balance the dataset.

  7. Evaluation: Make sure you're evaluating your model correctly, using a validation set that includes a balanced mix of all classes.

Merging two separately trained models isn't straightforward and typically not recommended. It's better to train a single model that can generalize across all classes.

For more detailed guidance on these strategies, please refer to our documentation. Keep experimenting, and good luck! πŸš€

PandaPandula commented 8 months ago

@glenn-jocher Thank you for the response. Just to make sure, I'll show you the occurrences per class (class 8 is person, and class 9 is suitcase) including COCO (first 12 classes) and the 3 extra. The model, with only 6k images of suitcases, detects them accurately. What doesn't add up is that, despite having 10k of the last 3 classes (13, 14, 15), it's unable to do so. The context of model usage is CCTV.

The model has been tested on a whole scene (with several people and the object to detect), and the object hasn't been detected. However, when only the part containing the object needing detection is focused on, it works well. Might the problem be that the images from Roboflow are close-ups of the new objects to detect, and the model might be interpreting those objects as only being of a certain large size??

Class 0 appears 9645 times. Class 1 appears 10305 times. Class 2 appears 10171 times. Class 3 appears 24038 times. Class 4 appears 6633 times. Class 5 appears 38432 times. Class 6 appears 13777 times. Class 7 appears 4992 times. Class 8 appears 280502 times. Class 9 appears 6354 times. Class 10 appears 4510 times. Class 11 appears 5906 times. Class 12 appears 11357 times. Class 13 appears 4246 times. Class 14 appears 10813 times. Class 15 appears 11640 times.

glenn-jocher commented 8 months ago

@PandaPandula, it seems like the issue might be related to the scale at which the objects appear in your training data versus the real-world application. If your Roboflow images are close-ups, the model may not generalize well to detecting those objects at different scales or in the context of a full scene.

Here are a few steps you can take to address this:

  1. Scale Variation: Include images with your objects at various scales and distances in your training set. This helps the model learn to recognize the objects regardless of their size in the frame.

  2. Contextual Training: Ensure that some training images include the objects in a context similar to your CCTV scenes, not just close-ups.

  3. Augmentation: Use augmentation techniques that alter the scale and crop of your images, such as random resizes, crops, and possibly perspective changes.

  4. Multi-Scale Training: Continue using the --multi-scale flag, as this can help the model learn to detect objects at different scales.

  5. Focused Datasets: If possible, create a subset of your COCO dataset that includes images with similar context and scale to your CCTV environment and train specifically on that subset.

  6. Evaluation: When evaluating, use a validation set that includes images with a variety of object scales and contexts to ensure the model's performance is representative of real-world use.

By ensuring that your training data includes a variety of scales and contexts, you can improve the model's ability to generalize and detect objects in full scenes as well as close-ups. Keep iterating on your training data and model configuration, and you should see improvements. Good luck! πŸ•΅οΈβ€β™‚οΈπŸ‘

github-actions[bot] commented 7 months ago

πŸ‘‹ Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.

For additional resources and information, please see the links below:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLO πŸš€ and Vision AI ⭐