ultralytics / ultralytics

NEW - YOLOv8 🚀 in PyTorch > ONNX > OpenVINO > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
23.76k stars 4.74k forks source link

The relationship between training sets, verification sets, and test sets; how to evaluate Benchamark? #11669

Open san-kou7 opened 1 week ago

san-kou7 commented 1 week ago

Search before asking

Question

Hello, I have successfully completed the training of the model and realized the prediction of pictures for training.

But some of my questions are: 1. Training set, verification set, and test sets; my understanding, the training set is used for training models, and after each EPOCH training is completed , Combined with the real label of the verification set, after the training is completed, you will get images such as Precison, Map, Recall. But I am doubting the proportion of training sets and verification sets.

  1. How to evaluate Benchamark; I currently have a batch of images without labels. I can predict these images with training models, but after performing prediction scripts, only detection boxes and confidence in the image will be obtained on the image. But I hope to use these images as Benchmark and get some predicted indicators (such as Precison, RECALL, MAP, etc.). How should I operate, should I mark this batch of images without labels and use it as a verification set in training? I feel puzzled by this

If you can answer my question, I will be grateful to this

Additional

No response

glenn-jocher commented 1 week ago

Hello! I'm happy to help clarify your questions. 🌟

  1. Datasets:

    • Training Set: Used to train the model.
    • Validation Set: Helps in tuning the model's hyperparameters and prevents overfitting. It's common practice to use about 10-20% of your dataset for validation, though the exact ratio can vary based on the dataset size and diversity.
    • Test Set: Used to evaluate the model performance after training. It should ideally be independent of the training and validation sets.
  2. Evaluating Benchmark:

    • To use your batch of unlabeled images as a benchmark, you do need to label them first. Only with labeled data can you compute meaningful metrics like precision, recall, and mAP. Once labeled, you can use this set as your test set to evaluate the trained model's performance.

In summary, yes, your approach to mark the unlabeled images and use them as a validation or test set is correct. This will allow you to compute performance metrics and effectively evaluate your model. 🚀

Feel free to ask if you have more questions or need further assistance!

san-kou7 commented 1 week ago

Thank you very much for your answer. I have successfully tested the benchmark.

I still have a question. After converting the trained yolov8.pt into an onnx model in ultralytics, I wrote a script to perform inference on yolov8.onnx. I found that the confidence of the inference result at this time will decrease. The same picture, Using predict in ultralytics can achieve a confidence level of 0.91. However, after successfully completing onnx and using the onnx inference code I wrote, the confidence level is only 0.87. May I ask why this is? I feel that there may be something wrong with my inference code. Is there an official onnx code? Pattern reasoning code?

Thanks for your guidance, I have benefited a lot from this

glenn-jocher commented 1 week ago

Hello!

I'm glad to hear you've successfully benchmarked your model! Regarding the difference in confidence scores when switching from the PyTorch model to the ONNX model, this usually can be due to slight differences in the way computations are carried out in different frameworks or the precision settings used during the conversion.

Please make sure that during the conversion and the inference process, you maintain consistent pre-processing steps. For instance, normalization should be identical in both cases. Additionally, ensure you've enabled the same settings, such as image size (imgsz) and model precision.

We don't have a dedicated ONNX inference script in our official GitHub repository, but you can usually handle ONNX models pretty straightforwardly with the ONNX Runtime. Here's a simple example of how you might load and perform inference with an ONNX model:

import onnxruntime as ort
import numpy as np

# Load your ONNX model
session = ort.InferenceSession('path_to_yolov8.onnx')

# Assuming 'input_name' is the name of your input layer (check with session.get_inputs())
input_name = session.get_inputs()[0].name

# Prepare your image (make sure you preprocess it exactly as you did before training)
image = your_preprocess_function(your_image)

# Perform inference
outputs = session.run(None, {input_name: image})

Ensure your preprocessing and post-processing steps (like confidence thresholding and non-max suppression) in the ONNX inference script match those used in the original PyTorch model.

If discrepancies persist, reviewing intermediate outputs or even re-exporting the model might be helpful. Happy coding! 🚀

san-kou7 commented 1 week ago

Have you done any relevant testing? After converting the pt model to onnx, will the confidence level decrease when making predictions?

I currently use the official predict code to test the pt model and onnx model respectively, and found that there are also differences in confidence, and there is also a point drop when running the yolo benchmark (98.3-97.1)

glenn-jocher commented 1 week ago

Hello! Yes, it's quite common to observe slight differences in confidence levels when converting models between formats, such as from PyTorch .pt to ONNX. This can happen due to minor differences in how computations are handled between different frameworks or settings during conversion.

For ensuring best consistency, double-check that preprocessing steps are identical in both scenarios. The image normalization and resizing should be exactly the same when testing both .pt and .onnx models.

If the discrepancies in performance metrics are considerable, it might also be helpful to ensure that during the ONNX conversion, optimizations or quantizations (if applied) are compatible with the model's requirements. Here's a quick code snippet to ensure consistency in preprocessing for both models:

# Example preprocessing function
def preprocess(image):
    # Resize, normalize, etc.
    return processed_image

# Use this preprocessing for both .pt and .onnx model inferences
image = preprocess(your_raw_image)

Always make sure that both models are evaluated under the same conditions. Happy model tuning! 🚀