ultralytics / yolov5

YOLOv5 πŸš€ in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
50.08k stars 16.18k forks source link

Map doesnt increase #8260

Closed moahaimen closed 2 years ago

moahaimen commented 2 years ago

Search before asking

Question

hi, I made now 5000 images about 1000 for each class, and the map increased from 0.22 to 0.59 58.7% mAP 80.8% precision 48.8% recall i made the

!python train.py --img 416 --batch 16 --epochs 1200 --data but the mAP doesn't increase more than 0.58 ever /content/yolov5/w_detection-16/data.yaml --weights yolov5n.pt --cache but I need to make it reach 0.97 I did label all the images I did everything you mentioned also Roboflow helped me a lot, but I need to understand how to make the background in the dataset and how will it affect and for Weapon detection which type of YOLOV5 model in better I am using Yolov5s, Yolov5n

thank you

Additional

No response

github-actions[bot] commented 2 years ago

πŸ‘‹ Hello @moahaimen, thank you for your interest in YOLOv5 πŸš€! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a πŸ› Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.

For business inquiries or professional support requests please visit https://ultralytics.com or email support@ultralytics.com.

Requirements

Python>=3.7.0 with all requirements.txt installed including PyTorch>=1.7. To get started:

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

CI CPU testing

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), validation (val.py), inference (detect.py) and export (export.py) on macOS, Windows, and Ubuntu every 24 hours and on every commit.

VYRION-Ai commented 2 years ago

@moahaimen The first distribution of images between the training and val folder is very important Second, how many classes? Third, yolov5x is the best but slow if you need help just tell me

moahaimen commented 2 years ago

hi first I need help, if you can, I appreciate that to you second I am working on roboflow it makes the distribution third, I have 13 classes o made them at the beginning but now its only 6 classes

moahaimen commented 2 years ago

@moahaimen The first distribution of images between the training and val folder is very important Second, how many classes? Third, yolov5x is the best but slow if you need help just tell me

hi again, i wish to contact you, I really need your help please if you can respond thank you

github-actions[bot] commented 2 years ago

πŸ‘‹ Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.

Access additional YOLOv5 πŸš€ resources:

Access additional Ultralytics ⚑ resources:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLOv5 πŸš€ and Vision AI ⭐!

hoangkhoiLE commented 2 years ago

Can you show me your dataset ? Maybe I can help you to have good distribution of your dataset to train better because I already have face same problem !

github-actions[bot] commented 2 years ago

πŸ‘‹ Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.

Access additional YOLOv5 πŸš€ resources:

Access additional Ultralytics ⚑ resources:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLOv5 πŸš€ and Vision AI ⭐!

glenn-jocher commented 10 months ago

@hoangkhoiLE It's great to see you helping out! It's essential to have a good distribution of images between the training and validation sets for successful training. Additionally, your dataset's diversity and how well it generalizes to unseen data are significant factors.

For further insights, please refer to the Ultralytics Docs at https://docs.ultralytics.com/yolov5/ and feel free to reach out if you need additional assistance. Good luck with your training!

mochaab commented 9 months ago

Hi Please, I need your help.

I am detecting guidewire tips in fluoroscopic images. When I use detect.py on my test dataset, the result is impressive. It correctly detects most of the tips in all frames. However, when I validate or assess my custom-trained model, it returns around 0.4 for both precision and recall and 0.3 value for mAP, which does not justify the output of detect.py on same dataset. Can you please help me figure out why?

I have another dataset taken from the lab, it is not fluoroscopic images, but only taken by web camera, but I used it in my initial experiments. I did same process than I described earlier, and for mAP I got around 0.8

Why is that?

glenn-jocher commented 9 months ago

@mochaab Thanks for reaching out! It's great to hear about the impressive detection results on your test dataset. The discrepancy you're observing during validation could be due to differences in the dataset distribution or the characteristics of the images in your test and validation sets. I recommend carefully evaluating the diversity and representativeness of your validation data.

As for the dataset from the lab, various factors such as lighting conditions, camera perspectives, or object appearances could contribute to the difference in mAP. It's essential to ensure that the validation dataset accurately reflects the test-time conditions for optimal performance evaluation.

Remember to consult the Ultralytics Docs at https://docs.ultralytics.com/yolov5/ for further support. Good luck, and feel free to reach out if you have more questions!

mochaab commented 9 months ago

Thank you for your response. I'v been looking at the dataset I used fot evaluation. Thanks for the idea.

One more question, I am using yolov5s, and in other forums I see that result.png contains IoU of some kind and box, and objectness.

Why did not I have those in my result.png? Does it suffice to only have mAP50 as a metric for object detection (particularly, guidewire tip, the only class I have in the dataset).

How is IoU computed in yolov5 or how is it related to mAP50. I searched already why IoU does not exist in result.png of yolov5, but I could not find answers.

I would appreciate so much if you can give me some insight about this. Thanks

glenn-jocher commented 9 months ago

@mochaab The result.png file in YOLOv5 doesn't contain the IoU, bounding boxes, or objectness score. YOLOv5 calculates the IoU during training and utilizes it to compute the mAP, which provides a comprehensive evaluation of detection performance. The mAP metric considers both precision and recall across varying IoU thresholds, offering a holistic assessment of detection accuracy.

To delve deeper into the IoU and mAP computation within YOLOv5, I recommend referring to the Ultralytics Docs at https://docs.ultralytics.com/yolov5/. These resources can provide you with a better understanding of the evaluation process and how to interpret mAP50 results for your specific use case.

Should you have further questions, feel free to ask. Best of luck with your project!

mochaab commented 9 months ago

Hi @glenn-jocher , I appreciate you responding. I did what you suggested to look closely to my dataset. However, I still have some issues even after I tested some combinations. In this comment, you can find more details. It would be so helpful if you could give me some more insights and clarification.

My issue: Results for validation on my test data is misleading. While val.py returns very low mAP, recall and precision, the labels generated by val.py and the detections using detect.py are almost perfect when manually checked.

β€’ Context: I am working on guidewire/catheter β€œtip” detection. So far, the images, videos and real-time simulations look promising (for dataset 1). β€’ I have two datasets: β—¦ Dataset1: Images taken from the lab (with web-camera, phantom vessel and guidewire). Image resolutions (640x480 or vice versa) Example: The first image is a sample of a background image, and the second one is an example of an image with a guidewire tip BG_CTGW_p3_bg_hf_vf_randrot_87 GWS_p4_colch_3

Overall I have 1663 images for validation, 9117 for training and 668 for testing β—¦ Dataset2: Images taken from the clinic (obtained from x-ray, phantom vessel and guidewire). Image resolution (480x410 or vice versa) Example: 1st image (background), 2nd image (with tip) BG_SR_p1_crop_hf_104 SR_p2_og_colch_40

Overall I have 1023 on validation, 6991 on training and 1056 for testing (but I modified this into 100: 30 background and 70 with images to simplify data for validation) β€’ Validation Result and Tests β—¦ Validation on Test Dataset 1 shows promising results with mAP50 0.76, recall 0.764 and precision 0.854 . The tests I did using detect.py shows good results as well, which backs up the scores and numbers that I just mentioned: GW_p2_hf_28 GW_p2_hf_colorch_32

β—¦ Validation on Test Dataset 2 (clinical xray images) after I minimized the amount of data and validate only on images with similar guidewire model, I got this these values: 0.224 (Precision) 0.225 (Recall) 0.161 (mAP50). However, when I use detect.py to do detections on similar dataset and I checked it manually (100 images in total), 100% of background images were correctly classified as background and 100% of the images with tips, were correctly detected (tip enclosed by bounding box, with good confidence scores): GW_p2_og_8 GW_p2_og_6 GW_p2_og_12

Similar with validation, I use conf_thres=0.3 and iou_thres=0.45 for detection.

I also compared the ground truth bounding box to the prediction (in detect.py), as well as ground truth to the validated image and the results are not that bad for the scores: image

My question is, what are the reasons that my validation scores for dataset 2 is low, while the detection is really good? Does it has something to do with xray feature of the images? but how can I explain it, or where can I refer? Also,most of the preparation steps for both dataset are similar... I wonder why the accuracy is so low for the second dataset while the first one has high accuracy, recall and precision.

I am hoping that you could help me.

Thanks a lot, @glenn-jocher!

mochaab commented 9 months ago

Hi @glenn-jocher , I appreciate you responding. I did what you suggested to look closely to my dataset. However, I still have some issues even after I tested some combinations. In this comment, you can find more details. It would be so helpful if you could give me some more insights and clarification.

My issue: Results for validation on my test data is misleading. While val.py returns very low mAP, recall and precision, the labels generated by val.py and the detections using detect.py are almost perfect when manually checked.

β€’ Context: I am working on guidewire/catheter β€œtip” detection. So far, the images, videos and real-time simulations look promising (for dataset 1). β€’ I have two datasets: β—¦ Dataset1: Images taken from the lab (with web-camera, phantom vessel and guidewire). Image resolutions (640x480 or vice versa) Example: The first image is a sample of a background image, and the second one is an example of an image with a guidewire tip BG_CTGW_p3_bg_hf_vf_randrot_87 GWS_p4_colch_3

Overall I have 1663 images for validation, 9117 for training and 668 for testing β—¦ Dataset2: Images taken from the clinic (obtained from x-ray, phantom vessel and guidewire). Image resolution (480x410 or vice versa) Example: 1st image (background), 2nd image (with tip) BG_SR_p1_crop_hf_104 SR_p2_og_colch_40

Overall I have 1023 on validation, 6991 on training and 1056 for testing (but I modified this into 100: 30 background and 70 with images to simplify data for validation) β€’ Validation Result and Tests β—¦ Validation on Test Dataset 1 shows promising results with mAP50 0.76, recall 0.764 and precision 0.854 . The tests I did using detect.py shows good results as well, which backs up the scores and numbers that I just mentioned: GW_p2_hf_28 GW_p2_hf_colorch_32

β—¦ Validation on Test Dataset 2 (clinical xray images) after I minimized the amount of data and validate only on images with similar guidewire model, I got this these values: 0.224 (Precision) 0.225 (Recall) 0.161 (mAP50). However, when I use detect.py to do detections on similar dataset and I checked it manually (100 images in total), 100% of background images were correctly classified as background and 100% of the images with tips, were correctly detected (tip enclosed by bounding box, with good confidence scores): GW_p2_og_8 GW_p2_og_6 GW_p2_og_12

Similar with validation, I use conf_thres=0.3 and iou_thres=0.45 for detection.

I also compared the ground truth bounding box to the prediction (in detect.py), as well as ground truth to the validated image and the results are not that bad for the scores: image

My question is, what are the reasons that my validation scores for dataset 2 is low, while the detection is really good? Does it has something to do with xray feature of the images? but how can I explain it, or where can I refer? Also,most of the preparation steps for both dataset are similar... I wonder why the accuracy is so low for the second dataset while the first one has high accuracy, recall and precision.

I am hoping that you could help me.

Thanks a lot, @glenn-jocher!

Just an update. I looked closely to the predicted bounding box and ground truth and it looks like most of the tips in the predicted ones are in the corner of the bounding box while tips in the labels are really in the middle. So I decreased the iouv in val.py from 0.5 to 0.2 and I got good scores. Now my question is, what is the relationship of --iou-thres and iou in mAP50 (iou=50), cause even when I decrease the value in --iou-thres, I do not get a high mAP, but if I decrease the value in iouv (hardcoded in val.py) then mAP increases.

glenn-jocher commented 9 months ago

Hello @mochaab, I'm glad to hear you've made some progress by adjusting the IoU threshold. The --iou-thres parameter in val.py sets the IoU threshold for considering a detection as a true positive during validation. A lower --iou-thres means that detections with a lower overlap with ground truth will still be considered correct, which can increase precision and recall but may not necessarily reflect the actual performance of the model at a standard IoU threshold (e.g., 0.5).

On the other hand, mAP50 refers to the mean Average Precision at an IoU threshold of 0.5. Changing the iouv parameter directly affects the computation of mAP at different IoU thresholds. If you lower the iouv value, you're effectively evaluating the model at a less strict IoU threshold, which can artificially inflate the mAP score.

It's important to maintain a balance between achieving high mAP scores and ensuring that the model performs well at standard IoU thresholds, as this is a more realistic measure of performance. If your model requires a lower IoU threshold to achieve good performance, it might indicate that the model is not predicting the bounding boxes as accurately as it should, and further tuning or dataset adjustments may be necessary.

For a more detailed explanation of these concepts, please refer to the Ultralytics Docs at https://docs.ultralytics.com/yolov5/. Keep experimenting with the thresholds and the dataset, and best of luck with your project!