Closed yasindagasan closed 3 years ago
@yasindagasan thanks for the suggestions! The fastest way you might be able to introduce some of these features would be to try them out yourself and submit a PR with your suggested updates. They are good ideas but unfortunately we are quite saturated at the moment maintaining the repo and backpropagating recent updates to ultralytics/yolov3 soon.
I've seen confusion matrices requested before, but I'm not sure if people realize that these are generally only created for classification tasks. It's not clear to me how this would extend to object detection. Do you have any references for confusion matrices applied explicitly to object detection results?
may be more in the domain of labelling tools, though we have also had requests before for dataset visualization tools, which is definitely a missing feature. I wonder if we could use something like plotly dashboard to put together an interactive visualizer. Updating and modifying the labels would be a bit above and beyond this, but I agree a viewer at the minimum is needed.
I don't quite follow, could you show some examples of this?
@glenn-jocher thanks very much for the reply! Yes I agree and can definitely understand your workload!
I will create a PR if I happen to have something out of these.
Plotly dashboard seems to be a good idea. We can also use some other available tools too but I am not sure how easy it would be to integrate them. I am currently using labellimg and CVAT tools for labelling the datasets. CVAT seems to be quite nice and there is an option to provide a model to do auto labelling. In terms of the difficulty of setting up, labelimg is pretty straightforward. CVAT needs a bit of time to configure.
Sorry it was not clear. Maybe I can better explain by an example.
This part was more about uncertain labels. I sometimes work on image classification tasks where we have labels which were labelled by people from the domain. Although the labels are created by people with an understanding of the domain, due to the complexities of the objects in an image, labellers can make mistakes. Some images are at the borderline of two classes and can be difficult to differentiate by a naked eye. These examples are pretty easy to be confused by labellers.
What we generally do to handle such cases is to train a model (i.e. resnet50) and use trained weights to do feature extraction. We remove the last layer of the network and for a given image we obtain a vector of features with a length of 512 or 2048. To visualise the similarity and also detect outliers, we then do a dimensionality reduction using UMAP and reduce the dimension into 2 or 3 dimensions. Points closer in this created feature space is expected to be more similar to those which are far away. We then colour the points by the provided labels and detect images which were labelled wrongly. We then do necessary corrections on the labels. Below image can be an example for a feature space.
I was wondering if you think such an approach could be applicable for object detection somehow. I have previously attempted to do this but did not have time to experiment more on it. What I did was using the bounding boxes predicted by YOLO, I cropped the images and put them in separate images with a standard size using resizing and padding. I then trained a fastai model to obtain weights and did dimensionality reduction and clustering using DBSCAN.
One of the problems I encountered was that the objects can be in varying aspect ratio and resizing them might sometimes lead to cropping them or destroying the aspect ratio. I am not sure about its applicability for object detection but just wanted to discuss and get your thoughts on it.
@yasindagasan interesting. I've raised an issue on https://github.com/kaanakan/object_detection_confusion_matrix/issues/6 to ask the author if he might help with integration.
As you've noticed in 3, aspect ratio modifications from stretching and other considerations complicate box extraction and classification/detection interoperability.
We are actually working on a YOLOv5 classifier though, so this may be suitable for that. The classifier is very easy to build, it's simply a YOLOv5 backbone with a Classify() head: https://github.com/ultralytics/yolov5/blob/8d2d6d2349cc4732667888435e9f01912d80a4ba/models/common.py#L227-L237
Your idea 2 and 3 are in line with the suggestion I made here: https://github.com/ultralytics/yolov5/issues/895
I really think this deserves some thoughts. I understand this would not really be of much use to increase mAP on Coco since you can't really change the labels, but the labels for custom model on custom datasets are almost always to place to start when you want to get better results from your Yolov5 models
@yasindagasan @Ownmarc 1 and 2 are definitely more closed-ended feasible ideas that may be implemented. 1 in particular can be plotted at the same time as the PR curve using the same TP and FP vectors perhaps. I recently updated a few plots BTW in https://github.com/ultralytics/yolov5/pull/1432 and https://github.com/ultralytics/yolov5/pull/1428, including the PR curves and the labels plots for better introspection.
3 is a bit more open ended, but I understand the desire for better failure mode analysis and post training introspection tools. This is somewhat in the same direction as active learning, or adapting your labels based on training feedback. I'll have to think about it.
One update for post training analysis is that you can use a confidence slider on Weights & Biases results to help you determine a best confidence threshold for deployment. This is rather new and useful, but mainly suitable for just that one task of determining a best real-world confidence threshold to use. You can see an example here (click the gear on the Media panel): https://wandb.ai/glenn-jocher/yolov5_tutorial/reports/YOLOv5-COCO128-Tutorial-Results--VmlldzozMDI5OTY
@Ownmarc yes, very related with your #895 suggestions. Sorry I was not aware of that thread. I also experienced that labels on custom datasets have a huge impact on success. Any improvement on that would be very helpful.
@glenn-jocher I just have seen your recent updates. I like the PR curve colored by classes in #1428, it is very useful!
Classifier might sound suitable actually. I will be experimenting with it soon.
@yasindagasan @Ownmarc I've integrated a confusion matrix now into test.py. See PR https://github.com/ultralytics/yolov5/pull/1474
There's some unfortunate overlap in the computations going on inside the confusion matrix class and the mAP computation code, particularly in that they both compute IoU matrices separately (duplication of effort), but this will have to do for now. The confusion matrix adds about 5-10 seconds of wall clock time to test.py, ie. a typical YOLOv5m COCO test.py run will now take 1:25, up 10 seconds from 1:15 before.
this is awesome thanks @glenn-jocher!
@yasindagasan I'll leave this issue open as https://github.com/ultralytics/yolov5/pull/1474 only partially satisfies the feature additions.
After considering the results a bit, I think unfortunately the conclusions you can draw from the confusion matrices in object detection may be somewhat limited, as it seems that by far the largest cross-class confusion is simply class {x}
to background, regardless of x
.
Still, any extra information should help everyone understand their results a bit better :)
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
I was trying to obtain the confusion matrix using test.py after referring to this PR, and I'm using Colab. How can I obtain the matrix for my custom dataset? Do I have to run the test.py script and mention a parameter explicitly?
Edit: if the matrix is produced automatically at the end of training as mentioned here, how can I save it if in case I can'r visualise it in Colab? Also, I would want to produce the matrix was my testing data that I parse in detect.py. I have the predictions and I have the ground truth, can someone guide me so that I can do that?
@mansi-aggarwal-2504 test.py automatically generates confusion matrices. Results are logged to the directory indicated, i.e. runs/test/exp
@mansi-aggarwal-2504 test.py automatically generates confusion matrices. Results are logged to the directory indicated, i.e.
runs/test/exp
Thank you @glenn-jocher for such a prompt response! I must be doing something, I found it. This must be the matrix for validation data and I want to do the same thing for my test data with predictions and ground truth. Shall I make yaml file and point it towards this test dataset or is there a more efficient way? Also I want to understand how to interpret the matrix. I have a single class i.e. flower. I understand the first column but I don't get the second column i.e. background FP mapped with flower with value 1.0 What does the second column him?
@mansi-aggarwal-2504 you can evaluate test.py to any part of your dataset (train, val, test) using the --task
flag:
https://github.com/ultralytics/yolov5/blob/7b36e38cf8f3d3c08e973b18913ae8e41ff970b2/test.py#L297
The matrix indicates that 100% of your background FPs are caused by the flower
category.
@glenn-jocher thank you very much. I will use the --task
flag.
Also:
The matrix indicates that 100% of your background FPs are caused by the
flower
category.
Got it, thanks!
@mansi-aggarwal-2504 you can evaluate test.py to any part of your dataset (train, val, test) using the
--task
flag:https://github.com/ultralytics/yolov5/blob/7b36e38cf8f3d3c08e973b18913ae8e41ff970b2/test.py#L297
I set the task
to test
and uploaded the ground truth of the test set. I received one confusion matrix. So this is the average of all images in my test set?
Is there a way to get a separate matrix for all images?
@mansi-aggarwal-2504 that is correct, a set of images generates one confusion matrix.
@mansi-aggarwal-2504 that is correct, a set of images generates one confusion matrix.
Is there a way to get a separate matrix for all images?
@mansi-aggarwal-2504 the confusion matrix already applies to all images.
@glenn-jocher but the resultant matrix is like an average result of images in the test set, right?
@mansi-aggarwal-2504 yes one confusion matrix is generated for the entire dataset.
Can someone explain this for me
@Mohamed-Elredeny see https://en.wikipedia.org/wiki/Confusion_matrix
I noticed that when I run the test.py
, the total number of objects detected is 4718 i.e. TP + FP
But the total number of objects detected by detect.py
was 4605. I also ran detect.py
with same conf_thres
and iou_thres
values as default test.py
values i.e. --iou-thres 0.6 --conf-thres 0.001
but the count is still different.
How should I change the parameters in detect.py
to get the same results as test.py
because I want the object count and TP count as test.py
.
EDIT:
300 flowers in all images when params are kept same as test.py
I also tried running test.py
with --iou-thres=0.45 --conf-thres=0.25
which are the default for detect.py
but there is still a difference between number of objects detected.
@mansi-aggarwal-2504 see test.py and metrics.py for TP and FP computation.
@mansi-aggarwal-2504 test.py automatically generates confusion matrices. Results are logged to the directory indicated, i.e.
runs/test/exp
Thank you @glenn-jocher for such a prompt response! I must be doing something, I found it. This must be the matrix for validation data and I want to do the same thing for my test data with predictions and ground truth. Shall I make yaml file and point it towards this test dataset or is there a more efficient way? Also I want to understand how to interpret the matrix. I have a single class i.e. flower. I understand the first column but I don't get the second column i.e. background FP mapped with flower with value 1.0 What does the second column him?
I am getting similar confusion matrix. I have following questions: 1. I did not understand fourth value i.e. Bottom right. Can you please explain? 2. How to decrease third value (Up Right) i.e. 1.0? 3. How to increase fourth value i.e. Bottom right?
@priyankadank 👋 Hello! Thanks for asking about improving YOLOv5 🚀 Training results. columns are normalized.
Most of the time good results can be obtained with no changes to the models or training settings, provided your dataset is sufficiently large and well labelled. If at first you don't get good results, there are steps you might be able to take to improve, but we always recommend users first train with all default settings before considering any changes. This helps establish a performance baseline and spot areas for improvement.
If you have questions about your training results we recommend you provide the maximum amount of information possible if you expect a helpful response, including results plots (train losses, val losses, P, R, mAP), PR curve, confusion matrix, training mosaics, test results and dataset statistics images such as labels.png. All of these are located in your project/name
directory, typically yolov5/runs/train/exp
.
We've put together a full guide for users looking to get the best results on their YOLOv5 trainings below.
train_batch*.jpg
on train start to verify your labels appear correct, i.e. see example mosaic.Larger models like YOLOv5x and YOLOv5x6 will produce better results in nearly all cases, but have more parameters, require more CUDA memory to train, and are slower to run. For mobile deployments we recommend YOLOv5s/m, for cloud deployments we recommend YOLOv5l/x. See our README table for a full comparison of all models.
--weights
argument. Models download automatically from the latest YOLOv5 release.
python train.py --data custom.yaml --weights yolov5s.pt
yolov5m.pt
yolov5l.pt
yolov5x.pt
custom_pretrained.pt
--weights ''
argument:
python train.py --data custom.yaml --weights '' --cfg yolov5s.yaml
yolov5m.yaml
yolov5l.yaml
yolov5x.yaml
Before modifying anything, first train with default settings to establish a performance baseline. A full list of train.py settings can be found in the train.py argparser.
--img 640
, though due to the high amount of small objects in the dataset it can benefit from training at higher resolutions such as --img 1280
. If there are many small objects then custom datasets will benefit from training at native or higher resolution. Best inference results are obtained at the same --img
as the training was run at, i.e. if you train at --img 1280
you should also test and detect at --img 1280
.--batch-size
that your hardware allows for. Small batch sizes produce poor batchnorm statistics and should be avoided.hyp['obj']
will help reduce overfitting in those specific loss components. For an automated method of optimizing these hyperparameters, see our Hyperparameter Evolution Tutorial.If you'd like to know more a good place to start is Karpathy's 'Recipe for Training Neural Networks', which has great ideas for training that apply broadly across all ML domains: http://karpathy.github.io/2019/04/25/recipe/
Good luck 🍀 and let us know if you have any other questions!
🚀 Feature
Motivation
I am using object detection on a custom dataset. We labelled quite a number of images and we are interested in detecting the same type of an object i.e. fractures on bones. We set some rules to assign classes to the objects based on how they look. However, some classes have quite a number of borderline examples that are easy to be confused by the model. I thought a confusion matrix would be pretty handy to assess the model performances and update my labels. I would be very keen to have these features in the codes if possible:
1. Confusion matrix: I would like to know what classes are getting confused so that I can maybe correct my labels or merge some of the confused classes as a single class.
2. Error analyser: It would also be good to further analyse my labels and obtain potential improvements (i.e. revising the bounding boxes to better define the object boundaries etc.). Some of the object boundaries are not clear and different people can label the boundaries differently.
3. Plotting objects in feature space for similarity and outlier analysis: I would also be very much interested in knowing the problematic labels in the datasets. I was thinking maybe we can use the trained weights to extract features and do further dimensionality reduction using umap colored by classes.
Let me know what you think. I am happy to discuss it further!