ultralytics / ultralytics

NEW - YOLOv8 🚀 in PyTorch > ONNX > OpenVINO > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
23.75k stars 4.74k forks source link

How to evaluate Benchamark?The relationship between training sets, verification sets and test sets #11670

Open san-kou7 opened 1 week ago

san-kou7 commented 1 week ago

Search before asking

Question

Hello, I have successfully completed the training of the model and realized the prediction of pictures for training.

But some of my questions are: 1. Training set, verification set, and test sets; my understanding, the training set is used for training models, and after each EPOCH training is completed , Combined with the real label of the verification set, after the training is completed, you will get images such as Precison, Map, Recall. But I am doubting the proportion of training sets and verification sets.

  1. How to evaluate Benchamark; I currently have a batch of images without labels. I can predict these images with training models, but after performing prediction scripts, only detection boxes and confidence in the image will be obtained on the image. But I hope to use these images as Benchmark and get some predicted indicators (such as Precison, RECALL, MAP, etc.). How should I operate, should I mark this batch of images without labels and use it as a verification set in training? I feel puzzled by this

If you can answer my question, I will be grateful to this

Additional

No response

glenn-jocher commented 1 week ago

Hello!

It sounds like you've made great progress with your model. Let's address your questions:

  1. Datasets Split: Your understanding is correct. The training set trains the model, and the validation set is used to evaluate it during training. Typically, a split of around 70% training, 15% validation, and 15% test sets is common, but this can vary based on dataset size and diversity.

  2. Evaluating Benchmark: To evaluate models against a benchmark, you indeed need labeled data. Without labels, you can't compute metrics like Precision, Recall, or mAP. If you want to use your batch of unlabeled images, you will first need to annotate them with the correct labels to create a test set. Only then can you truly assess the model's performance.

You can perform predictions and then manually or semi-automatically (using some pre-trained models) annotate these predictions to create your test set.

I hope this clarifies your doubts! Keep up the good work with your models. 🚀