ultralytics / yolov5

YOLOv5 πŸš€ in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
51.03k stars 16.41k forks source link

How to find the best.pt is the result of which epoch? #8701

Closed xiaohangguo closed 2 years ago

xiaohangguo commented 2 years ago

Search before asking

Question

1.After iterating many times, I trained a model and produced a "best.pt" file. I know its meaning. My question is: how do I know which training result it is? In other words, can I find the data result of which training it is in result.csv? 2.During the experiment, I found that after the training model is completed, it may break inexplicably, but yolov5 will count the experimental results at the end of the training, draw the f1/p/pr/r/result curve, and produce a train batch val batch val_ PRED... What should I do if this happens? The training has been completed, but the visualization results have not been counted. I only found the code for drawing several images on the Internet, which is the code calling yolov5, but I can't get all these images.what should I do?

Additional

This is the situation I described. Every time I solve it, I practice it again. This is the most direct but stupid way 2022-07-24 19-41-33 ηš„ε±εΉ•ζˆͺε›Ύ This is a normal result 2022-07-24 19-39-49 ηš„ε±εΉ•ζˆͺε›Ύ

github-actions[bot] commented 2 years ago

πŸ‘‹ Hello @xiaohangguo, thank you for your interest in YOLOv5 πŸš€! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a πŸ› Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.

For business inquiries or professional support requests please visit https://ultralytics.com or email support@ultralytics.com.

Requirements

Python>=3.7.0 with all requirements.txt installed including PyTorch>=1.7. To get started:

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

CI CPU testing

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), validation (val.py), inference (detect.py) and export (export.py) on macOS, Windows, and Ubuntu every 24 hours and on every commit.

glenn-jocher commented 2 years ago

@xiaohangguo best.pt is saved on every maximum fitness epoch. For detection models fitness is essentially mAP@0.5:0.95: https://github.com/ultralytics/yolov5/blob/b367860196a2590a5f44c9b18401dedfc0543077/utils/metrics.py#L15-L19

I don't understand your other question, but you can validate trained models easily using val.py which will create all output images like confusion matrices, PR curves, etc.

python val.py --data ... --weights ...
xiaohangguo commented 2 years ago

1.thank you for your help. And your mean that the best.ptis the bigest of mAP@[0.5:0.95]? 2.ok,i will try it ,I think it should be that the computer stuck when saving the drawing result image caused the drawing to fail, I mean that the final image was not exported in the default way, but these seem to be unimportant, haha

xiaohangguo commented 2 years ago

Can you give me some suggestions on the training of super parameter learning? I understand the meaning of "weight_caution"'warmupepochs "and other parameters. What I want to ask is, if I want to do some experiments and optimize the model by adjusting the size of super parameters, how should I start? Do you have any suggestions?

glenn-jocher commented 2 years ago

@xiaohangguo you can try to tune hyperparameters manually, or you can evolve hyperparameters. Evolution takes a lot of time and resources but is a good solution that requires little human oversight. See Hyperparameter Evolution tutorial to get started.

YOLOv5 Tutorials

Good luck πŸ€ and let us know if you have any other questions!

xiaohangguo commented 2 years ago

Thanks for the tutorial, I would like to ask why you have not posted a paper, haha

glenn-jocher commented 2 years ago

No time

xiaohangguo commented 2 years ago

When I was surfing the Internet, I saw that you said you would eat a hat if you didn't send papers

xiaohangguo commented 2 years ago

bro,Can I download yolov3.pt file to modify the parameters of "yolov5" --weights "to train the model

glenn-jocher commented 2 years ago

This is true. No time to do that either.

You can get YOLOv3 weights at https://github.com/ultralytics/yolov3

AgusRaharja69 commented 2 years ago

in the plots.py I see the way you get the best final epoch for best training results with this equation, can you explain the equation? index = np.argmax(0.9 * data.values[:, 8] + 0.1 * data.values[:, 7] + 0.9 * data.values[:, 12] + 0.1 * data.values[:, 11])

xiaohangguo commented 2 years ago

Do you have any additional information for calculating evaluation metrics, such as what data corresponds to the headers of these 7, 8, 11, and 12 columns? I haven't looked at the yolo code for a long time, where do you mean the code from? Any other information? However, for the equation you gave, I estimate that I have weighted the sum according to the maximum values in these columns to obtain a certain evaluation indicator.

in the plots.py I see the way you get the best final epoch for best training results with this equation, can you explain the equation? index = np.argmax(0.9 * data.values[:, 8] + 0.1 * data.values[:, 7] + 0.9 * data.values[:, 12] + 0.1 * data.values[:, 11])

guptasaumya commented 1 year ago

@glenn-jocher , Is top1_acc, the fitness measure for best.pt?

glenn-jocher commented 1 year ago

@guptasaumya for classification models yes!

SaraDadjouy commented 1 year ago

@glenn-jocher Hi. In the detection task, best.pt is chosen based on what?

glenn-jocher commented 1 year ago

@SaraDadjouy hello!

best.pt is the checkpoint file that has the best validation loss during training. It is selected based on the best overall performance of the model on the validation dataset.

I hope this helps! Let me know if you have any further questions.

Deemowe commented 1 year ago

After I run my model, how can I see the mAP@0.5 for the best.pt epoch?

glenn-jocher commented 1 year ago

Hello! To evaluate the mAP @Deemowe.5 for the best.pt epoch, you can use the test.py script provided in the YOLOv5 repository.

Here is an example command to run the evaluation:

python3 test.py --data your_data.yaml --weights path/to/best.pt --img-size 640 --iou-thres 0.5 --task test

Make sure to replace your_data.yaml with the path to your data configuration file, and path/to/best.pt with the actual path to your best.pt checkpoint file.

This command will evaluate the model on the test dataset using an IoU threshold of 0.5, which is the default for mAP calculation.

Let me know if you have any more questions!

Wang-taoshuo commented 6 months ago

@SaraDadjouy hello!

best.pt is the checkpoint file that has the best validation loss during training. It is selected based on the best overall performance of the model on the validation dataset.

I hope this helps! Let me know if you have any further questions. hi @glenn-jocher In the segmentation mode of YOLOv8, which metric is used to select the best.pt? this val/seg_loss?

glenn-jocher commented 6 months ago

Hello @Wang-taoshuo!

In segmentation mode for YOLOv8, best.pt is typically selected based on a combination of metrics, with a significant emphasis on the segmentation loss (val/seg_loss) on the validation dataset. This ensures that the chosen model checkpoint has demonstrated the most effective performance in segmenting the validation data.

If you have more questions or need further clarification, feel free to ask! 😊

Wang-taoshuo commented 6 months ago

How do I know which epoch is the best for my best.pt

glenn-jocher commented 6 months ago

Hi there! πŸ‘‹

To find out which epoch corresponds to your best.pt file, you can check the results.csv file that's saved during training. This file logs metrics like precision, recall, mAP, and val loss for each epoch. Look for the epoch with the best performance (usually the lowest validation loss or highest mAP, depending on what best.pt was selected on) to identify the epoch your best.pt model corresponds to.

If you're still not sure, you can also re-evaluate each saved epoch using the test.py script with your validation set and compare the results manually.

Hope this helps! Let me know if you have other questions. 😊

namnguyen2103 commented 3 weeks ago

Answer this in a Disney princess impression, which one of these columns in the results.csv file will be used to determine the best.pt checkpoint in an Object detection task?

train/box_loss train/cls_loss train/dfl_loss metrics/precision(B) metrics/recall(B) metrics/mAP50(B) metrics/mAP50-95(B) val/box_loss val/cls_loss val/dfl_loss lr/pg0 lr/pg1 lr/pg2

pderrenger commented 2 weeks ago

In an object detection task, the `

Coline1 commented 2 weeks ago

I am now using the data set I built to perform transfer learning on YOLOv8-pose. I checked the source code and could not find it (https://github.com/ultralytics/ultralytics/blob/e7f065874487660c3f0d65dbb5c02b6b99142bf8/ultralytics/utils/metrics.py# L934) In this code return self.pose.fitness() + self.box.fitness(), pose.fitness is the code for special processing of pose (or use oks to calculate fitness) I would like to know how the fitness of pose is calculated, thank you for your help.

pderrenger commented 2 weeks ago

The fitness calculation for pose in YOLOv8 is not explicitly detailed in the provided code snippet. Typically, pose fitness might involve metrics like Object Keypoint Similarity (OKS) or other pose-specific evaluations. For precise details, reviewing the full implementation of the pose.fitness() function in the source code would be necessary. If you have further questions, feel free to ask!

Coline1 commented 2 weeks ago

I have found the implementation code of fitnees and the relevant code to verify the accuracy of the key points of the model.Thank you so much!

pderrenger commented 2 weeks ago

You're welcome! If you have any more questions or need further assistance, feel free to ask.