Closed PeterKBailey closed 1 month ago
👋 Hello @PeterKBailey, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.
If this is a 🐛 Bug Report, please provide a minimum reproducible example to help us debug it.
If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.
Python>=3.8.0 with all requirements.txt installed including PyTorch>=1.8. To get started:
git clone https://github.com/ultralytics/yolov5 # clone
cd yolov5
pip install -r requirements.txt # install
YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):
If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training, validation, inference, export and benchmarks on macOS, Windows, and Ubuntu every 24 hours and on every commit.
We're excited to announce the launch of our latest state-of-the-art (SOTA) object detection model for 2023 - YOLOv8 🚀!
Designed to be fast, accurate, and easy to use, YOLOv8 is an ideal choice for a wide range of object detection, image segmentation and image classification tasks. With YOLOv8, you'll be able to quickly and accurately detect objects in real-time, streamline your workflows, and achieve new levels of accuracy in your projects.
Check out our YOLOv8 Docs for details and get started with:
pip install ultralytics
@PeterKBailey hello PeterKBailey,
Thank you for reaching out and providing a detailed account of the issue you're encountering. It's great that you're taking the time to experiment with the YOLOv5 models and datasets.
From your description, it seems like the model is training well but you're experiencing a discrepancy between validation and detection results. This issue is typically related to how the model generalizes to new data or how detection settings are applied.
A few things to consider:
--source
in detect.py
is representative of the data the model was trained on. A dramatic shift in data distribution between training and detection can lead to poor performance.conf-thres
and iou-thres
without success, the issue might not lie with the confidence or IOU thresholds. However, it's good to note that detecting with very low conf-thres
can indeed produce a lot of spurious predictions.Considering you've already tried training with more images and different model sizes, the issue seems to focus more on the detection stage rather than training. I would recommend revisiting the preprocessing steps in detect.py
to ensure they align with how your validation set images are processed.
Unfortunately, without seeing the exact images or data you're working with in detect.py
, it's challenging to provide a more precise diagnosis. I encourage you to review the preprocessing part of the detection pipeline and ensure it closely matches that of your training and validation pipeline.
If you continue to face difficulties, please provide more details about the step-by-step preprocessing and detection commands used, along with any specific error messages or unexpected output details. This additional information would help in pinpointing the exact cause of the discrepancy.
Keep experimenting and asking questions; that's how we all learn and improve. Good luck! 🚀
@glenn-jocher Hi glenn-jocher,
Thanks for your reply! I have a few follow up questions and I'll also redo the training and document what I'm doing while I go:
So first to try and respond to your suggestions:
Ensure your --source in detect.py is representative of the data
- I am using one of my training images as the image being detected on, I wanted to see that the model could overfit my dataset
Given validation works but detection does not, double-check the preprocessing steps in detection.
- Sorry, I'm not sure if you mean I should look into how the source code operates? Or do I need to do my own preprocessing when I'm using detect that I don't need to do with val?
So when it comes to preprocessing, I'm not doing anything myself, nor am I using Roboflow or any other pipeline. My dataset is a set of jpg images with varying resolutions (ex: 5312x2988, 4032x3024, 4160x3120, 1920x1080). I have 114 such images in my images/training directory. I have 74 different images in my images/validation directory.
0 0.6611328125 0.5291748046875 0.01123046875 0.019775390625
89 0.6590576171875 0.486328125 0.030029296875 0.0517578125
To verify that my .txt files are correct I visualize on an example:
python yolov5/train.py --img 640 --batch 16 --epochs 3 --data dataset.yaml --weights '' --cfg yolov5n.yaml
train: weights=, cfg=yolov5n.yaml, data=dataset.yaml, hyp=yolov5\data\hyps\hyp.scratch-low.yaml, epochs=3, batch_size=16, imgsz=640, rect=False, resume=False, nosave=False, noval=False, noautoanchor=False, noplots=False, evolve=None, evolve_population=yolov5\data\hyps, resume_evolve=None, bucket=, cache=None, image_weights=False, device=, multi_scale=False, single_cls=False, optimizer=SGD, sync_bn=False, workers=8, project=yolov5\runs\train, name=exp, exist_ok=False, quad=False, cos_lr=False, label_smoothing=0.0, patience=100, freeze=[0], save_period=-1, seed=0, local_rank=-1, entity=None, upload_dataset=False, bbox_interval=-1, artifact_alias=latest, ndjson_console=False, ndjson_file=False
github: up to date with https://github.com/ultralytics/yolov5
YOLOv5 v7.0-295-gac6c4383 Python-3.11.6 torch-2.2.1+cpu CPU
hyperparameters: lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.5, cls_pw=1.0, obj=1.0, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0
With these results:
What I want to accomplish here is see that the model works on images that were trained on at the very least.
I don't have a command to share, I did this in my file explorer :)
python ./yolov5/val.py --weights ./yolov5/runs/train/exp/weights/best.pt --img 640 --data ./dataset.yaml
So everything is looking good up until now!! 😄
Finally I feed that same image to detection (keeping in mind that this image was used in training, I expect this should have the same or similar results to val)
python ./yolov5/detect.py --weights ./yolov5/runs/train/exp/weights/best.pt --img 640 --data dataset.yaml --source ./images/training/0VE6VzyUjItYMGIsRKwJBg.jpg
detect: weights=['./yolov5/runs/train/exp/weights/best.pt'], source=./images/training/0VE6VzyUjItYMGIsRKwJBg.jpg, data=dataset.yaml, imgsz=[640, 640], conf_thres=0.25, iou_thres=0.45, max_det=1000, device=, view_img=False, save_txt=False, save_csv=False, save_conf=False, save_crop=False, nosave=False, classes=None, agnostic_nms=False, augment=False, visualize=False, update=False, project=yolov5\runs\detect, name=exp, exist_ok=False, line_thickness=3, hide_labels=False, hide_conf=False, half=False, dnn=False, vid_stride=1
YOLOv5 v7.0-295-gac6c4383 Python-3.11.6 torch-2.2.1+cpu CPU
Fusing layers...
YOLOv5n summary: 157 layers, 2301718 parameters, 0 gradients, 5.8 GFLOPs
image 1/1 C:\Users\Peter\Downloads\out\data\images\training\0VE6VzyUjItYMGIsRKwJBg.jpg: 384x640 (no detections), 149.4ms
Speed: 1.0ms pre-process, 149.4ms inference, 1.0ms NMS per image at shape (1, 3, 640, 640)
Results saved to yolov5\runs\detect\exp
So we can see no detections as stated in the output.
python ./yolov5/detect.py --weights ./yolov5/runs/train/exp/weights/best.pt --img 640 --data dataset.yaml --source ./images/training/0VE6VzyUjItYMGIsRKwJBg.jpg --conf-thres 0.001 --iou-thres 0.6 --max-det 300
no detections
(no point in reattaching the box-less image)
detect: weights=['./yolov5/runs/train/exp/weights/best.pt'], source=./images/training/0VE6VzyUjItYMGIsRKwJBg.jpg, data=dataset.yaml, imgsz=[640, 640], conf_thres=1e-05, iou_thres=0.6, max_det=300, device=, view_img=False, save_txt=False, save_csv=False, save_conf=False, save_crop=False, nosave=False, classes=None, agnostic_nms=False, augment=False, visualize=False, update=False, project=yolov5\runs\detect, name=exp, exist_ok=False, line_thickness=3, hide_labels=False, hide_conf=False, half=False, dnn=False, vid_stride=1
YOLOv5 v7.0-295-gac6c4383 Python-3.11.6 torch-2.2.1+cpu CPU
Fusing layers...
YOLOv5n summary: 157 layers, 2301718 parameters, 0 gradients, 5.8 GFLOPs
image 1/1 C:\Users\Peter\Downloads\out\data\images\training\0VE6VzyUjItYMGIsRKwJBg.jpg: 384x640 60 regulatory--no-stopping--g15s, 240 regulatory--bicycles-only--g3s, 156.5ms
Speed: 1.0ms pre-process, 156.5ms inference, 18.2ms NMS per image at shape (1, 3, 640, 640)
Results saved to yolov5\runs\detect\exp3
So this is a very long post, I tried to be precise and give the relevant information. If there is more that I can give please let me know I would really like to get this solved.
Thank you for your help and thank you in advance if you can help any further!!
Hello @PeterKBailey,
Thank you for the detailed follow-up and the efforts you've made to troubleshoot this issue. It appears you've done a thorough job testing various configurations, which is very helpful. Your descriptions and steps are clear, offering a good insight into the problem. Let's address your concerns:
Regarding preprocessing in detect.py
vs. val.py
: The preprocessing should be handled automatically by both scripts. You shouldn't need to manually adjust preprocessing steps unless you're troubleshooting or experimenting with different preprocessing techniques. The fact that validation works but detection does not, with the settings you've described, suggests the issue is not related to the preprocessing.
Using Training Images for Detection: Using training images for detection to test overfitting is a valid approach. However, seeing no detections despite this can sometimes be tied to how images are processed or the confidence thresholds (which you've already adjusted).
Next Steps: Given that you've tried various conf-thres levels, and you only get predictions at extremely low confidence levels, it might indicate the model is uncertain about its predictions. This can happen if the model hasn't learned the features well enough, despite seeing similar images during training. Since you're using a very low number of epochs (3 in your initial command), this might not be sufficient for the model to learn effectively, even from a small dataset.
Considerations:
I noticed you've done great work ensuring the dataset integrity and experimenting with various configurations. Continuing to adjust the training length and possibly experimenting with the learning rate may provide further insights. These steps are iterative and experimental in nature. Your dedication to resolving this is commendable!
Keep up the great work, and don't hesitate to reach out if you have more questions or updates based on these suggestions. 🚀
Hello @glenn-jocher,
Thank you for your continuing advice and support! So following what you said I tried two new things:
It was on the same small training dataset I mentioned before.
python yolov5/train.py --img 640 --batch 16 --epochs 64 --data dataset.yaml --weights '' --cfg yolov5n.yaml
I got the following results image:
python ./yolov5/val.py --weights ./yolov5/runs/train/exp2/weights/best.pt --img 640 --data ./dataset.yaml
I get the following image:
Which looks good like the other models I've shown before.
python ./yolov5/detect.py --weights ./yolov5/runs/train/exp2/weights/best.pt --img 640 --source ./images/training/0VE6VzyUjItYMGIsRKwJBg.jpg --data ./dataset.yaml --conf-thres 0.1 --iou-thres 0.6 --max-det 300
Got (no detections).
python ./yolov5/detect.py --weights ./yolov5/runs/train/exp2/weights/best.pt --img 640 --source ./images/training/0VE6VzyUjItYMGIsRKwJBg.jpg --data ./dataset.yaml --conf-thres 0.001 --iou-thres 0.6 --max-det 300
Still using that same small training set.
python yolov5/train.py --img 640 --batch 16 --epochs 3 --data dataset.yaml --weights '' --cfg yolov5l.yaml
Which gave me the following results image:
python ./yolov5/val.py --weights ./yolov5/runs/train/exp3/weights/best.pt --img 640 --data ./dataset.yaml
Which produced:
(which looks good too).
python ./yolov5/detect.py --weights ./yolov5/runs/train/exp3/weights/best.pt --img 640 --source ./images/training/0VE6VzyUjItYMGIsRKwJBg.jpg --data ./dataset.yaml --conf-thres 0.1 --iou-thres 0.6 --max-det 300
(no detections)
python ./yolov5/detect.py --weights ./yolov5/runs/train/exp3/weights/best.pt --img 640 --source ./images/training/0VE6VzyUjItYMGIsRKwJBg.jpg --data ./dataset.yaml --conf-thres 0.00001 --iou-thres 0.6 --max-det 300
Which produced more spurious boxes unfortunately.
I did my best to follow your suggestions but so far I am still not getting a result. I am very confused as to why this is the case, what is validation doing so differently from detect? Are my images too big / do I need to use a different --img parameter?
I can see that these two trained models are clearly different based only on detect and how/what boxes they place at low enough confidence. But they both perform exactly correctly for validation and I'm left puzzled as ever haha! I suppose I can try training the yolov5l model for more/64 epochs but given the costly nature I am hesitant to do that before consulting you for other options.
Do you have any more suggestions I can try? Thank you again for your time and efforts!!
Hello again @PeterKBailey,
It's great to see your persistence and thorough experimentation with both the small and large models over different epoch lengths. Your efforts are truly commendable. 🌟 It's indeed puzzling that despite the models producing good validation results, the detection phase still struggles to produce meaningful detections without resorting to extremely low confidence thresholds.
Given the scenario you've described, a few thoughts come to mind:
Model's Generalization Ability: Even though validation results look good, it's possible that the model's ability to generalize to unseen data or slightly different conditions (like lighting, angle, etc., even in training images) isn't robust enough. This could be a factor if the dataset is small or not diverse enough.
Dataset Diversity and Size: If you haven't already, consider expanding your dataset with more varied examples. Sometimes, more data or artificially augmenting existing data (e.g., using different lighting conditions, rotations) can help the model generalize better.
Post-Training Analysis: It can be beneficial to analyze which images or types of images your model struggles with. This can provide insights into potential biases or weaknesses in the dataset. For instance, a confusion matrix can help understand if some classes are being consistently misclassified.
Hyperparameters Tuning: Besides the number of epochs and the model size, other hyperparameters might influence the detection phase. You could experiment with different learning rates or even the augmentation strategies during training.
Detection Script Check: Ensure the detect.py
script processes images in the same manner as the validation phase. Although it should be consistent, it's worth verifying that the same transformations and preprocessing steps are applied.
Image Size for Detection: You mentioned the size of your images and questioned the use of the --img
parameter. It's worth experimenting with this parameter further. Sometimes, using a larger --img
size for detection (consistent with the original image resolution) might help, especially if your dataset consists of high-resolution images.
It's not uncommon to face challenges like these, especially when working on fine-tuning models for specific datasets. Before trying a long and potentially costly training with YOLOv5l for 64 epochs, my advice would be to explore the dataset's diversity and size, along with the model's generalization capabilities and hyperparameters tweaking. Also, consider checking the detect.py
for any discrepancies in image processing compared to the training/validation phases.
Your dedication to solving this is impressive, and each step brings more valuable insights. Keep exploring, and feel free to share any further observations or results. 🚀
👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.
For additional resources and information, please see the links below:
Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!
Thank you for your contributions to YOLO 🚀 and Vision AI ⭐
Discussed in https://github.com/ultralytics/yolov5/discussions/12861
I'm sorry to raise this as an issue but I'm not sure what else to try, please let me know if this belongs somewhere else!! I just clicked on the option given by GitHub.
[I am using this dataset: https://www.mapillary.com/dataset/trafficsign]