ultralytics / yolov5

YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
51.24k stars 16.44k forks source link

How to make a confusion matrix in YOLOv5 step by step? #10365

Closed justhusnan closed 1 year ago

justhusnan commented 2 years ago

Search before asking

Question

I've done model training using YOLOv5 and got pretty good performance. Therefore I want to make a confusion matrix for my needs. But I don't know how to make it and I've tried several tutorials and I still fail. Please help to explain step by step how to make a confusion matrix on YOLOv5 🙏🏻

Additional

No response

github-actions[bot] commented 2 years ago

👋 Hello @husnan622, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.

For business inquiries or professional support requests please visit https://ultralytics.com or email support@ultralytics.com.

Requirements

Python>=3.7.0 with all requirements.txt installed including PyTorch>=1.7. To get started:

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

YOLOv5 CI

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training, validation, inference, export and benchmarks on MacOS, Windows, and Ubuntu every 24 hours and on every commit.

glenn-jocher commented 2 years ago

@husnan622 val.py makes confusion matrices automatically. See val.py for usage examples.

justhusnan commented 2 years ago

@glenn-jocher I've tried what you suggested regarding val.py but I haven't been able to bring up the confusion matrix image, I don't know what's missing from the command I made

and val.py is still scanning the valid dataset even though I have modified val.py to be task=test

Confusion Matrix

glenn-jocher commented 2 years ago

@husnan622 check your runs/val/exp2 directory, confusion matrix is in there.

glenn-jocher commented 2 years ago

@husnan622 if your data.yaml has a test: key then yes you can run python val.py --task test to use your test split.

justhusnan commented 2 years ago

Thanks @glenn-jocher

Are the weights that I use correct?

and usually the code in the confusion matrix is contained actual data path & prediction data path for example y_true and y_pred, where can i find it

glenn-jocher commented 2 years ago

@husnan622 you can use any weights you want as long as they are trained on your --data data.yaml

You can access the confusion matrix code in utils/metrics.py:

https://github.com/ultralytics/yolov5/blob/7845cea91343e430566689deff6e50f6c2b473fa/utils/metrics.py#L126-L220

justhusnan commented 2 years ago

Thank you so much for your help @glenn-jocher

justhusnan commented 2 years ago

@glenn-jocher Previously I was able to generate a confusion matrix using val.py, the results are like the following image:

confusion_matrix

But I want a confusion matrix that only displays True Positive (TP), True Negative (TN), False Positive (FP) and False Negative (FN), for example in the following image:

Confusion Matrix

How to generate a confusion matrix that only displays True Positive (TP), True Negative (TN), False Positive (FP) and False Negative (FN)?

glenn-jocher commented 2 years ago

@husnan622 set normalize=False in ConfusionMatrix()

justhusnan commented 2 years ago

What is the background in the confusion matrix image?

confusion_matrix

How do I remove this section?

github-actions[bot] commented 1 year ago

👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.

Access additional YOLOv5 🚀 resources:

Access additional Ultralytics ⚡ resources:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐!

charanlsa commented 1 year ago

@glenn-jocher I got the confusion matrix as shown, confusion_matrix

For this matrix the TP is nil(no value), FP is 1 and TN is 0.94 what these indicates exactly and may I know how to improve the values like TP has to get some value, FP has to decrease??

I really appreciate your help. Thanks in advance..

syamghali commented 1 year ago

@glenn-jocher : For YOLOv5, how do I increase the font size of numbers in the confusion matrix?

glenn-jocher commented 1 year ago

@syamghali to increase the font size of the numbers in the confusion matrix in YOLOv5, you can modify the plot_confusion_matrix() function in the utils/plots.py file. Specifically, you can change the fontsize parameter in the heatmap function call on line 74. The default value is 14; you can increase it to the desired size. However, please note that while increasing the font size may make the numbers in the plot more readable, it may also reduce the space available for the plot, which could make the plot less visually informative.

muhanadabdul commented 1 year ago

pls. What is the background mean in the confusion matrix image? and how can I remove it?

glenn-jocher commented 1 year ago

@muhanadabdul the background area in the confusion matrix image represents values that are not part of the confusion matrix itself. This area is typically used to display legends or colorbars. To remove the background area, you can modify the plot_confusion_matrix() function in the utils/plots.py file of YOLOv5. Specifically, you can remove the code that generates the legend or colorbar or modify the relevant parameters to adjust their size and location. However, please note that removing or modifying these sections may make the plot less informative or harder to read, especially for users who are less familiar with the specific plot.

muhanadabdul commented 1 year ago

Dear, thank you for explanation, but I am afraid I do not understand what the values in the background label mean to the validation phase

image

the values with the red circle in the image above

glenn-jocher commented 1 year ago

@muhanadabdul hello, the values you have highlighted in the image represent the number of samples that were not considered in the validation phase. These samples may have been excluded from the validation set for a variety of reasons, such as missing annotations or insufficient image quality. The values can be useful to understand the size of the validation set relative to the full dataset and to identify any potential imbalances or biases in the dataset. However, they are generally not included in the confusion matrix or other validation metrics, as they do not represent true positive or negative results. Please let me know if you have any further questions or concerns.

muhanadabdul commented 1 year ago

Dear, thanks for your explanation, now it's clear, but just have another question. the two highlighted values in the image below are related to the same class "mobile_use" Based on your explanation I am confused in understanding their meaning if we let the 0.08 value represent the number of samples that were not considered in the validation phase, what the other value (0.06) mean?

image

Is there any way to find which images the model excluded from the validation set? Also, can we hide these two columns (V, H) and their values from the resulting confusion matrix

glenn-jocher commented 1 year ago

@muhanadabdul hello! The value of 0.08 represents the number of samples that were not included in the validation set for the "mobile_use" class. The value of 0.06 represents the percentage of samples for this class that were not included in the validation set. This percentage is calculated as the number of excluded samples (0.08) divided by the total number of samples (1.34) for this class. Regarding whether it's possible to find which images were excluded from the validation set, this depends on how the data was split and stored. If you have access to the code or procedure that was used to split the data, you may be able to identify the excluded images. If the image filenames or IDs are included in the dataset, you may also be able to cross-reference them with the validation set to identify the excluded images. Finally, regarding the question of hiding the V and H columns and their values from the confusion matrix, you can modify the plot_confusion_matrix() function in the utils/plots.py file of YOLOv5. Specifically, you can remove the code that generates the V and H columns, or modify the relevant parameters to adjust their size and location. Please note that modifying the confusion matrix in this way may make it less informative or harder to read, especially for users who are less familiar with the plot.

muhanadabdul commented 1 year ago

Fantastic, many thanks for your dear.

glenn-jocher commented 1 year ago

@muhanadabdul hello! You're very welcome. If you have any further questions or concerns, please don't hesitate to ask. We're here to help!

charanlsa commented 1 year ago

Dear Glenn I implemented the detection but at the time of execution the accuracy was not good so I used with hyperparameters and used for detection and I also used few(10%) negative images in the training but the confusion matrix is as shown, What exactly it describes glenn and How to decrease the 1 over there and How should I increase the accuracy. PLease reply asap as I'm doing the project . [image: image.png]

On Wed, 31 May 2023, 08:29 Glenn Jocher, @.***> wrote:

@muhanadabdul https://github.com/muhanadabdul hello! You're very welcome. If you have any further questions or concerns, please don't hesitate to ask. We're here to help!

— Reply to this email directly, view it on GitHub https://github.com/ultralytics/yolov5/issues/10365#issuecomment-1569424595, or unsubscribe https://github.com/notifications/unsubscribe-auth/A53QIF6R5QOXGOLDPVUYAZ3XI2XYLANCNFSM6AAAAAASRDZ2NM . You are receiving this because you commented.Message ID: @.***>

glenn-jocher commented 1 year ago

@charanlsa dear user,

Thank you for reaching out. The confusion matrix you have shared represents the performance of your YOLOv5 model on the validation set. The rows correspond to the ground truth classes, and the columns correspond to the predicted classes. Each element in the matrix represents the number of validation samples that were assigned to a specific ground truth class and predicted class. The diagonal elements represent the number of correct predictions for each class, and the off-diagonal elements represent the number of incorrect predictions.

If you are noticing a low accuracy, one possible approach to increasing it is to adjust the hyperparameters of your YOLOv5 model. Some hyperparameters you might consider tuning include the learning rate, batch size, and number of training iterations. Additionally, you may want to consider increasing the size or diversity of your training set to boost overall performance.

Regarding the specific class where you are noticing a high number of false positives (i.e., ground truth class 1 and predicted class 0), you may want to consider adjusting the class weights or focal loss coefficients to emphasize this class during training. You may also want to examine the annotations for this class carefully to ensure that they are accurate and complete.

I hope this information is helpful. Please let me know if you have any further questions or concerns, and I'll be happy to assist you. Good luck with your project!

Best regards, Glenn Jocher

muhanadabdul commented 1 year ago

plot_confusion_matrix()

Dear, I did not find the plot_confusion_matrix() function in the utils/plots.py ?

muhanadabdul commented 1 year ago

plot_confusion_matrix()

Dear, I did not find the plot_confusion_matrix() function in the utils/plots.py ?

glenn-jocher commented 1 year ago

@muhanadabdul hi there! I'm sorry to hear that you were not able to locate the plot_confusion_matrix() function in the utils/plots.py file. Just to make sure, are you using the latest version of YOLOv5? If so, the function should be included in the file. One possibility is that the function was accidentally removed or modified during your customization of the code.

If you are still unable to find the function, a workaround could be to use a third-party library like scikit-learn to plot the confusion matrix. For example, you could use the confusion_matrix() and plot_confusion_matrix() functions from scikit-learn to generate and visualize the confusion matrix.

Let me know if this helps, or if you have any further questions or issues.

Best regards, Glenn Jocher

muhanadabdul commented 1 year ago

dear, yes I using the latest version of YOLOv5 by using the clone code !git clone https://github.com/ultralytics/yolov5 # clone repo the file ATTACH here you can check it plots.zip I need where the code responsible for plotting the confusion_matrix image, really I don't need hird-party library,

glenn-jocher commented 1 year ago

@muhanadabdul Hi there! I apologize for the confusion. It looks like the plot_confusion_matrix() function is indeed not included in the latest version of YOLOv5. I apologize for the misinformation in my previous response.

If you would like to plot the confusion matrix without using a third-party library like scikit-learn, you will need to implement your own function to generate and plot the matrix. Here is some sample code you can use as a starting point:

import numpy as np
import matplotlib.pyplot as plt

def plot_confusion_matrix(cm, classes, normalize=False, title='Confusion matrix', cmap=plt.cm.Blues):
    """
    Plots a confusion matrix.
    """
    if normalize:
        cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
        print("Normalized confusion matrix")
    else:
        print('Confusion matrix, without normalization')

    plt.imshow(cm, interpolation='nearest', cmap=cmap)
    plt.title(title)
    plt.colorbar()
    tick_marks = np.arange(len(classes))
    plt.xticks(tick_marks, classes, rotation=45)
    plt.yticks(tick_marks, classes)

    fmt = '.2f' if normalize else 'd'
    thresh = cm.max() / 2.
    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        plt.text(j, i, format(cm[i, j], fmt),
                 horizontalalignment="center",
                 color="white" if cm[i, j] > thresh else "black")

    plt.ylabel('True label')
    plt.xlabel('Predicted label')
    plt.tight_layout()
    plt.show()

This code defines a function plot_confusion_matrix() that takes as input a confusion matrix, a list of class names, and other optional parameters like whether or not to normalize the matrix and what title to give the plot.

You can then generate the confusion matrix as a NumPy array using the sklearn.metrics.confusion_matrix() function (which you mentioned you would prefer not to use), and then use the plot_confusion_matrix() function to plot the matrix:


import itertools

# Generate confusion matrix using scikit-learn (or your preferred method)
cm = confusion_matrix(true_labels, predicted_labels)

# Define list of class names
classes = ['class_1', 'class_2',
dadin852 commented 1 year ago

Hello, here's a code made to generate Confusion Matrix with given .txt files : https://github.com/dadin852/YOLO-Confusion-Matrix-with-custom-labels.git I did this because the detection result between detect.py and test.py are different.

glenn-jocher commented 1 year ago

@dadin852 hello! Thank you for sharing your code for generating a confusion matrix with custom labels for YOLO. It's always helpful to have different options for analyzing the performance of our models.

Regarding the differences between the results of detect.py and test.py, keep in mind that detect.py performs detection on a single image or directory of images, whereas test.py evaluates the entire validation dataset and reports metrics such as mAP. It's possible that the differences you're observing are due to variations in the images that are being processed, or other factors such as the size of the batch being used during evaluation.

If you have any further questions or issues, feel free to reach out. We're here to help!

Best regards, Glenn Jocher

dadin852 commented 1 year ago

I've already set all the factors (img-size, batch-size, thres...) between detect.py and test.py ️ the same. And the results are different.

syamghali commented 1 year ago

Hi Glenn: When the YOLOv5 creates a confusion matrix, what is the default confidence threshold that is used for classification for background areas (False Positive). For, example, in the above matrix, for mobile_use background = 0.08. This means the model predict background as mobile_use in 8% of the cases., right ? And what is the default confidence threshold that is considered here, is it 0.2?

ryecries commented 1 year ago

hi i tried to edit metrics.py to generate confusion matrix without background, but it still generate it with background, how can i create confusion matrix without background?

dadin852 commented 1 year ago

Hello, are you suggesting that you'd like to remove the background FP/FN labels? My approach involves saving the matrix data in metrics.py and then editing it to create the confusion_matrix.png.

In metrics.py :

def plot(self, save_dir='', names=()):
    try:
        import seaborn as sn

        array = self.matrix / ((self.matrix.sum(0).reshape(1, self.nc + 1) + 1E-6) if False else 1 )  # normalize

        # Save the matrix data
        open(f'{my_save_dir}/matrix.txt', 'a+').write(np.array2string(array, separator=','))
        array[array < 0.005] = np.nan  # don't annotate (would appear as 0.00)

        ...
ryecries commented 1 year ago

Hi, sorry for the late reply. so when i edit my metrics.py, do I have to run val.py again?

Pada tanggal Rab, 20 Sep 2023 pukul 07.20 dadin852 @.***> menulis:

Hello, are you suggesting that you'd like to remove the background FP/FN labels? My approach involves saving the matrix data in metrics.py and then editing it to create the confusion_matrix.png.

In metrics.py :

def plot(self, save_dir='', names=()): try: import seaborn as sn

    array = self.matrix / ((self.matrix.sum(0).reshape(1, self.nc + 1) + 1E-6) if False else 1 )  # normalize

    # Save the matrix data
    open(f'{my_save_dir}/matrix.txt', 'a+').write(np.array2string(array, separator=','))
    array[array < 0.005] = np.nan  # don't annotate (would appear as 0.00)

    ...

— Reply to this email directly, view it on GitHub https://github.com/ultralytics/yolov5/issues/10365#issuecomment-1726709229, or unsubscribe https://github.com/notifications/unsubscribe-auth/AO376HQNGIBNVDVMXGEHRPTX3IZF5ANCNFSM6AAAAAASRDZ2NM . You are receiving this because you commented.Message ID: @.***>

glenn-jocher commented 1 year ago

@ryecries no worries about the delay! After editing the metrics.py file, you don't necessarily need to run val.py again. The metrics.py file is responsible for generating the confusion matrix based on the validation results that have already been computed.

Once you have made the desired changes in the metrics.py file, you can run the plot() method to generate the confusion matrix. This can be done independently from the validation process. The matrix will be saved as matrix.txt, and you can use this data to create your custom confusion matrix visualization.

Let me know if you have any further questions or need any assistance!

Cheers!

RyanTNN commented 1 year ago

hello. I have a question: I trained the YOLOv5 and I got the confusion matrix result (confusion matrix picture). but I want to re-create the confusion matrix picture because I want to change the font size or text in the confusion matrix picture. I always run training again to get the new confusion matrix result. are there any ways to create a new confusion matrix without training? can I use the weight result to re-create the confusion matrix picture? Thank you!

glenn-jocher commented 1 year ago

@RyanTNN hi there!

It is not necessary to run the training again to create a new confusion matrix with different font size or text. You can create a new confusion matrix using the existing model weights.

The confusion matrix is generated based on the predictions and ground truth labels during the evaluation process. If you want to change the font size or text in the confusion matrix picture, you can modify the code responsible for plotting the matrix.

In the metrics.py file, you can adjust the font size or other visualization settings within the plot() method. You can explore the matplotlib documentation for customizing the appearance of the confusion matrix.

Once you have made the desired modifications, you can run the evaluation again using the trained model weights by running the val.py script. This will generate a new confusion matrix with the updated visualization settings.

I hope this helps! Let me know if you have any further questions or need any clarifications.

RyanTNN commented 1 year ago

@RyanTNN hi there!

It is not necessary to run the training again to create a new confusion matrix with different font size or text. You can create a new confusion matrix using the existing model weights.

The confusion matrix is generated based on the predictions and ground truth labels during the evaluation process. If you want to change the font size or text in the confusion matrix picture, you can modify the code responsible for plotting the matrix.

In the metrics.py file, you can adjust the font size or other visualization settings within the plot() method. You can explore the matplotlib documentation for customizing the appearance of the confusion matrix.

Once you have made the desired modifications, you can run the evaluation again using the trained model weights by running the val.py script. This will generate a new confusion matrix with the updated visualization settings.

I hope this helps! Let me know if you have any further questions or need any clarifications.

okay. I got it. Thank you for your time.

glenn-jocher commented 1 year ago

@RyanTNN You're welcome! I'm glad I could help. If you have any more questions or need further assistance, feel free to ask. Happy coding!

Joshnavarma commented 1 year ago

@husnan622 val.py makes confusion matrices automatically. See val.py for usage examples.

what is the file used for plotting confusion matrix for yolov7

Joshnavarma commented 1 year ago

@glenn-jocher . could you please help me get confusion matrix and accuracy score for yolov7 on custom dataset

Joshnavarma commented 1 year ago

image Why my confusion matrix is not showing values? the image is the from exp folder after training.

glenn-jocher commented 1 year ago

@Joshnavarma It looks like the image link you provided is broken. To better assist you, could you please provide more details or a valid link for the confusion matrix image? Thank you!

Joshnavarma commented 1 year ago

confusion_matrix please find the attached image.

Apart from this i have one qestion that my yolov7 gave 85.2 % mAP on custom dataset fo 6k images(50 epochs, batch size 16). is it a acceptable mAP score and why it is giving that big vlaue. the reason i have the question is when i train my model with 100 epochs, batchsize=32 it gave around 64%mAP. now i doubt if i did anything wrong in my yolov7 training. i have used yolov7-training.pt file for trainng my custom data. am doing masters in data nalytics and this is my final year project i want to justify why the score has that big difference.

Appreciate your help. Thanks.

glenn-jocher commented 1 year ago

@Joshnavarma Thank you for sharing the confusion matrix image. It seems that the provided link is not accessible. If you could upload the image to a public image hosting service (such as Imgur or PostImage) and share the new link, I would be happy to take a look.

Regarding your question about the mAP score discrepancies, achieving 85.2% mAP on a custom dataset with 6k images after 50 epochs is certainly a notable result. However, it is essential to investigate potential factors that may have contributed to this high mAP score, such as the data distribution, augmentation techniques, and the balance of classes in your dataset. Additionally, changes in hyperparameters like batch size and the number of epochs can also impact the training process.

For the varying mAP scores obtained with different training configurations, it's important to perform a thorough investigation of any alterations made during the training process, such as changes in the dataset, data preprocessing, augmentation, and the impact of different hyperparameters.

As you're working on your final year project, understanding and justifying the impact of these factors on your results will be valuable. I recommend thoroughly reviewing your training process and experimenting with different configurations to understand the effects on performance.

If you need further assistance troubleshooting the confusion matrix issue or analyzing the differences in mAP scores, please feel free to provide additional details. I'm here to help!

Joshnavarma commented 1 year ago

https://postimg.cc/XX7t1B0G Thanks for the reply. please find the image now. Thanks.

glenn-jocher commented 1 year ago

@Joshnavarma thank you for sharing the image. However, the provided link still seems to be inaccessible. Could you double-check the link or provide an alternative method to access the confusion matrix image? It's important for me to review the confusion matrix to better assist you. Thank you!