ultralytics / ultralytics

NEW - YOLOv8 🚀 in PyTorch > ONNX > OpenVINO > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
23.76k stars 4.74k forks source link

Yolov8 semi-supervised learning #11678

Closed MosbehBarhoumi closed 1 week ago

MosbehBarhoumi commented 1 week ago

Search before asking

Description

I possess a labeled dataset consisting of approximately 75,000 images for car parts detection. While the model generally performs well, there are instances where it fails due to insufficient data. Fortunately, I have access to an additional 500,000 unlabeled images which I'm eager to utilize to enhance the model's performance. However, the labeling process is time-consuming. Hence, I'm considering semi-supervised learning to expedite the process. I'm uncertain about where to begin and whether there are pre-built tools available in YOLO to streamline this process and save time. Could you assist me with guidance on initiating semi-supervised learning and leveraging any existing resources within YOLO?

Use case

No response

Additional

No response

Are you willing to submit a PR?

github-actions[bot] commented 1 week ago

👋 Hello @MosbehBarhoumi, thank you for your interest in Ultralytics YOLOv8 🚀! We recommend a visit to the Docs for new users where you can find many Python and CLI usage examples and where many of the most common questions may already be answered.

If this is a 🐛 Bug Report, please provide a minimum reproducible example to help us debug it.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.

Join the vibrant Ultralytics Discord 🎧 community for real-time conversations and collaborations. This platform offers a perfect space to inquire, showcase your work, and connect with fellow Ultralytics users.

Install

Pip install the ultralytics package including all requirements in a Python>=3.8 environment with PyTorch>=1.8.

pip install ultralytics

Environments

YOLOv8 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

Ultralytics CI

If this badge is green, all Ultralytics CI tests are currently passing. CI tests verify correct operation of all YOLOv8 Modes and Tasks on macOS, Windows, and Ubuntu every 24 hours and on every commit.

glenn-jocher commented 1 week ago

@MosbehBarhoumi hello!

YOLOv8 currently doesn't have built-in support specifically tailored for semi-supervised learning directly. Nonetheless, a common approach includes training your model initially with the labeled dataset to establish a baseline and then utilizing this trained model to make predictions on your unlabeled dataset.

You can use these predictions to manually verify or correct the highest confidence outputs, incrementally incorporating them into your training process. This iterative method can create a refined model that leverages both labeled and unlabeled data effectively.

Here's a basic idea on how you might start:

  1. Train your initial model on your labeled data.
  2. Use the model to predict on the unlabeled data.
  3. Manually check high-confidence predictions to use as pseudo labels.
  4. Re-train your model by combining the original labeled data with the newly labeled data.

For implementation, you can use the predictions from:

from ultralytics import YOLO
model = YOLO('path/to/your/model.pt')
results = model.predict('path/to/unlabeled/images/')
results.save()  # save the predictions

You might gradually improve and expand your dataset using the procedure outlined above. While it may initially involve manual effort to verify high-confidence predictions, it can significantly enhance your model with the available unlabeled data.

Feel free to reach out if you need more detailed guidance on any of the steps!

MosbehBarhoumi commented 1 week ago

@glenn-jocher Thanks for your detailed answer. I'm currently doing exactly that, though I thought there might be a more efficient way to save even more time.