ultralytics / yolov5

YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
50.8k stars 16.36k forks source link

Validation metrics review #12662

Closed katia-katkat closed 8 months ago

katia-katkat commented 9 months ago

Search before asking

Question

Hi there,

After training the YOLOv5 model with a custom dataset, I've reviewed the validation metrics, and I am noticing unusually high results, and I would like to verify whether there is a reason for concern.

WhatsApp Image 2024-01-23 at 4 06 40 PM WhatsApp Image 2024-01-23 at 4 06 18 PM WhatsApp Image 2024-01-23 at 4 06 02 PM WhatsApp Image 2024-01-23 at 4 05 42 PM WhatsApp Image 2024-01-23 at 4 05 26 PM

Additional

No response

github-actions[bot] commented 9 months ago

👋 Hello @katia-katkat, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide a minimum reproducible example to help us debug it.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.

Requirements

Python>=3.8.0 with all requirements.txt installed including PyTorch>=1.8. To get started:

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

YOLOv5 CI

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training, validation, inference, export and benchmarks on macOS, Windows, and Ubuntu every 24 hours and on every commit.

Introducing YOLOv8 🚀

We're excited to announce the launch of our latest state-of-the-art (SOTA) object detection model for 2023 - YOLOv8 🚀!

Designed to be fast, accurate, and easy to use, YOLOv8 is an ideal choice for a wide range of object detection, image segmentation and image classification tasks. With YOLOv8, you'll be able to quickly and accurately detect objects in real-time, streamline your workflows, and achieve new levels of accuracy in your projects.

Check out our YOLOv8 Docs for details and get started with:

pip install ultralytics
glenn-jocher commented 9 months ago

@katia-katkat hi there!

It's great to hear that you're getting high validation metrics with your custom dataset! 🎉 High results can sometimes be a cause for celebration, but it's always good to be cautious and ensure they are valid.

Here are a few things to consider:

  1. Data Quality: Ensure your dataset is correctly labeled and representative of the problem space.
  2. Overfitting: Check if the model performs well on a separate test set to ensure it generalizes beyond the validation data.
  3. Metrics Consistency: Confirm that the metrics are consistent across different epochs to rule out any random spikes.
  4. Model Complexity: If your dataset is simple or the classes are easily distinguishable, high performance might be expected.

If everything checks out, then you might just have a well-performing model! If you're still unsure, you can always cross-validate your results with a different dataset or run additional tests.

For more detailed guidance on validation and troubleshooting, please refer to our documentation at https://docs.ultralytics.com/yolov5/.

Keep up the good work, and happy detecting! 😊

unikill066 commented 9 months ago

@katia-katkat / @glenn-jocher, I was wondering when you say validation metrics, is it the charts generated from train.py or val.py. I use a different yaml file for val.py.

For train.py I use train_data and val_data -- I suppose this gives us the validation metrics And, for val.py, I use train_data and test_data -- test metrics

What is unusual is val.py still expects labels, and if I just use detect.py, it does give any results. What I am looking for is once I have the models trained using train.py, I want to test it on some new images that the trained models haven't seen at all and produce metrics accordingly, val.py should do, any suggestions?

glenn-jocher commented 9 months ago

Hello again @unikill066!

The validation metrics you're referring to are indeed generated by train.py using the validation dataset specified in your training YAML file. These metrics help you monitor the model's performance during training.

For evaluating your trained model on new, unseen data, you would typically use val.py with a dataset that has labeled ground truth data. This is to calculate metrics like precision, recall, and mAP.

If you want to test your model on new images without ground truth labels (i.e., in a real-world scenario), you would use detect.py. This script doesn't provide metrics since it doesn't have any labels to compare against the predictions. It's purely for inference.

To get metrics from val.py on a new set of labeled images that your model hasn't seen:

  1. Prepare your dataset with images and corresponding labels in the format expected by YOLOv5.
  2. Update your validation YAML file to point to this new dataset.
  3. Run val.py with the updated YAML file to get the performance metrics.

Remember, val.py is for evaluating the model's performance with metrics, while detect.py is for running the model on new data to get predictions without performance evaluation.

Keep up the great work, and don't hesitate to reach out if you have more questions! 😄

unikill066 commented 9 months ago

Thank you, @glenn-jocher. Is there a way to save metrics such as F1-score, precision, recall, and accuracy when executing the val.py file? Similar to how we obtain all metrics by running the train.py and saving them in a results.csv file.

--verbose just outputs metrics in the terminal during the runtime, was looking for something like results.csv.

glenn-jocher commented 9 months ago

Absolutely, @unikill066!

When you run val.py, it automatically saves a summary of the metrics to a results.txt file in the runs/val/exp directory (or the latest exp directory if you've run multiple evaluations). This summary includes precision, recall, and mAP among other metrics.

If you need to save these metrics in a CSV format similar to train.py, you can do the following:

  1. After running val.py, locate the results.txt file in the corresponding runs/val/exp directory.
  2. Manually convert this results.txt file to a CSV format, or write a small script to parse the text and save it as CSV.

Currently, val.py doesn't directly output a results.csv file, but this is a good suggestion for a future feature update!

Keep pushing the boundaries, and happy validating! 😊

unikill066 commented 9 months ago

@glenn-jocher, I see that only the files (mentioned in the image) are created upon running val.py. And in the labels folder there are bounding box annotation files. I dont see any results.txt at all, please may you suggest if I have to add any additional flags to generate the results.txt file!? m

glenn-jocher commented 9 months ago

Apologies for the confusion, @unikill066.

The val.py script indeed does not automatically generate a results.txt file like train.py does. The metrics are printed to the console, and you can redirect them to a text file manually if needed.

To save the metrics to a file, you can run val.py and redirect the output to a text file using the command line. Here's an example of how you can do it:

python val.py --weights your_weights.pt --data your_data.yaml --img 640 > results.txt

This command will run the validation and save the console output to results.txt. You can then manually convert this text file to a CSV format if required.

I hope this helps, and thank you for bringing this to our attention. Your feedback is valuable in improving YOLOv5. Keep up the great work! 😊

unikill066 commented 9 months ago

Thank you @glenn-jocher; I have 56 models to run 8 models per each category so I run it in nohup(nohup python script.py &).

The script.py file is basically iterating over the /runs/train directory and running val.py using subprocess.run() as shown belore:

command = ['python', '/mnt/Working/neuron_detection/yolov5/val.py', '--img', '512',
               '--weights', '{}'.format(weights), '--data', '{}'.format(j),
               '--batch-size', '16', '--name', '{}'.format(i), '--save-txt', '--task', 'test', '--verbose']

This generated a nohup.out file, using which I extracted the metrics for the test data as shown below;

from pathlib import Path
import pandas as pd, re, getpass, shutil, seaborn as sns

data_list, temp_list = list(), list()
plots_save_path = Path(rf"C:\Users\{getpass.getuser()}\path")

with open(rf"C:\Users\{getpass.getuser()}\path\nohup.out", "r") as file:
    for line in file:
        if 'Results saved to' in line:
                exp_value = re.search(r'exp\d+', line)
                temp_list.append(exp_value)
        elif 'all' in line:
            line_data = [eval(i) for i in re.split(r'\s+', line.strip())][3:]
            data_list.append(line_data)

df = pd.DataFrame([[i]+j for i, j in zip(['exp']+[i.group() for i in temp_list], data_list)], columns=['Experiments', 'P', 'R', 'mAPS0', 'mAPS'])
df['F1'] = 2 * (df['P'] * df['R']) / (df['P'] + df['R'])  # calc F1score
df.head()

df['category'] = ['M4']*7+['M3']*7+['M2']*7+['M1']*7+['F4']*7+['F3']*7+['F2']*7+['F1']*7

fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(15, 8))

sns.lineplot(data=df, x="category", y="R", err_style="bars", errorbar=("se", 2), ax=axes[0,0])
sns.scatterplot(data=df, x="category", y="R", ax=axes[0, 0], color='red', marker='x', label='Actual points')
axes[0, 0].set_title('Recall across Categorys')
axes[0, 0].set_xlabel('Recall (R)')
axes[0, 0].set_ylabel('Category')
axes[0, 0].set_ylim(0.75, 1.05)

sns.lineplot(data=df, x="category", y="P", err_style="bars", errorbar=("se", 2), ax=axes[0,1])
sns.scatterplot(data=df, x="category", y="P", ax=axes[0, 1], color='red', marker='x', label='Actual points')
axes[0, 1].set_title('Recall across Categorys')
axes[0, 1].set_ylabel('Precision (P)')
axes[0, 1].set_xlabel('Category')
axes[0, 1].set_ylim(0.75, 1.05)

sns.lineplot(data=df, x="category", y="F1", err_style="bars", errorbar=("se", 2), ax=axes[1,0])
sns.scatterplot(data=df, x="category", y="F1", ax=axes[1, 0], color='red', marker='x', label='Actual points')
axes[1, 0].set_title('Recall across Categorys')
axes[1, 0].set_ylabel('F1-score (F1)')
axes[1, 0].set_xlabel('Category')
axes[1, 0].set_ylim(0.75, 1.05)

sns.lineplot(data=df, x="category", y="mAPS0", err_style="bars", errorbar=("se", 2), ax=axes[1,1])
sns.scatterplot(data=df, x="category", y="mAPS0", ax=axes[1, 1], color='red', marker='x', label='Actual points')
axes[1, 1].set_title('Recall across Categorys')
axes[1, 1].set_ylabel('Accuracy')
axes[1, 1].set_xlabel('Category')
axes[1, 1].set_ylim(0.75, 1.05)

plt.tight_layout()
plt.savefig(plots_save_path/'plot.pdf')

plt.show()
glenn-jocher commented 9 months ago

Hello @unikill066,

It looks like you've developed a solid approach to automate the evaluation of multiple models using val.py and extract the metrics from the nohup.out file. Your script for parsing the output and generating a DataFrame with the metrics, as well as the subsequent visualization using seaborn, is a great way to analyze the performance across different categories.

Your method of using subprocess.run() to iterate over the models and the subsequent parsing of the nohup.out file is a practical solution for batch processing. Just ensure that the output parsing aligns with the actual output format of val.py to avoid any discrepancies in the metrics.

If you have any specific questions or run into issues with this process, feel free to ask. Your initiative to streamline the evaluation process is commendable, and it's great to see such advanced usage of YOLOv5.

Keep up the excellent work! 😊

github-actions[bot] commented 8 months ago

👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.

For additional resources and information, please see the links below:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLO 🚀 and Vision AI ⭐