Closed mabergerx closed 3 years ago
👋 Hello @mabergerx, thank you for your interest in 🚀 YOLOv5! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.
If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.
If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.
For business inquiries or professional support requests please visit https://www.ultralytics.com or email Glenn Jocher at glenn.jocher@ultralytics.com.
Python 3.8 or later with all requirements.txt dependencies installed, including torch>=1.7
. To install run:
$ pip install -r requirements.txt
YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):
If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit.
Forgot to add, when using the CLI inference through detect.py
I see the following:
Fusing layers...
Model Summary: 392 layers, 46611336 parameters, 0 gradients, 114.1 GFLOPS
While using the PyTorch hub loading (with autoshape
), I get the following:
Model Summary: 499 layers, 46642120 parameters, 46642120 gradients, 114.3 GFLOPS
What exactly is happening with the layer fusion? Might that be the explanation?
@mabergerx detect.py and pytorch hub are different inference pathways. Inference uses the same model (and same forward method), though the pre and post processing differ. This should result in near identical (but not mathematically equal) results between the two, provided you use the same settings (confidence threshold, iou threshold, img-size, etc...)
I don't know of any reason or precedent for different layer counts and FLOPS, this would normally indicate two different models. I will do a full comparison on here to verify.
python detect.py --source data/images --weights yolov5s.pt --conf 0.25
Namespace(agnostic_nms=False, augment=False, classes=None, conf_thres=0.25, device='', exist_ok=False, img_size=640, iou_thres=0.45, name='exp', project='runs/detect', save_conf=False, save_txt=False, source='data/images/', update=False, view_img=False, weights=['yolov5s.pt'])
YOLOv5 v4.0-96-g83dc1b4 torch 1.7.0+cu101 CUDA:0 (Tesla V100-SXM2-16GB, 16160.5MB)
Fusing layers...
Model Summary: 224 layers, 7266973 parameters, 0 gradients, 17.0 GFLOPS
image 1/2 /content/yolov5/data/images/bus.jpg: 640x480 4 persons, 1 bus, Done. (0.010s)
image 2/2 /content/yolov5/data/images/zidane.jpg: 384x640 2 persons, 1 tie, Done. (0.011s)
Results saved to runs/detect/exp2
Done. (0.103s)
Input:
import torch
# Model
model = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)
# Images
dir = 'https://github.com/ultralytics/yolov5/raw/master/data/images/'
imgs = [dir + f for f in ('zidane.jpg', 'bus.jpg')] # batched list of images
# Inference
results = model(imgs)
results.print()
results.save()
Output:
Downloading: "https://github.com/ultralytics/yolov5/archive/master.zip" to /root/.cache/torch/hub/master.zip
from n params module arguments
0 -1 1 3520 models.common.Focus [3, 32, 3]
1 -1 1 18560 models.common.Conv [32, 64, 3, 2]
2 -1 1 18816 models.common.C3 [64, 64, 1]
3 -1 1 73984 models.common.Conv [64, 128, 3, 2]
4 -1 1 156928 models.common.C3 [128, 128, 3]
5 -1 1 295424 models.common.Conv [128, 256, 3, 2]
6 -1 1 625152 models.common.C3 [256, 256, 3]
7 -1 1 1180672 models.common.Conv [256, 512, 3, 2]
8 -1 1 656896 models.common.SPP [512, 512, [5, 9, 13]]
9 -1 1 1182720 models.common.C3 [512, 512, 1, False]
10 -1 1 131584 models.common.Conv [512, 256, 1, 1]
11 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
12 [-1, 6] 1 0 models.common.Concat [1]
13 -1 1 361984 models.common.C3 [512, 256, 1, False]
14 -1 1 33024 models.common.Conv [256, 128, 1, 1]
15 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
16 [-1, 4] 1 0 models.common.Concat [1]
17 -1 1 90880 models.common.C3 [256, 128, 1, False]
18 -1 1 147712 models.common.Conv [128, 128, 3, 2]
19 [-1, 14] 1 0 models.common.Concat [1]
20 -1 1 296448 models.common.C3 [256, 256, 1, False]
21 -1 1 590336 models.common.Conv [256, 256, 3, 2]
22 [-1, 10] 1 0 models.common.Concat [1]
23 -1 1 1182720 models.common.C3 [512, 512, 1, False]
24 [17, 20, 23] 1 229245 models.yolo.Detect [80, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]]
Model Summary: 283 layers, 7276605 parameters, 7276605 gradients
Downloading https://github.com/ultralytics/yolov5/releases/download/v4.0/yolov5s.pt to yolov5s.pt...
100%
14.1M/14.1M [00:01<00:00, 9.47MB/s]
Adding autoShape...
image 1/2: 720x1280 2 persons, 1 tie
image 2/2: 1080x810 4 persons, 1 bus
Saving results/zidane.jpg, results/bus.jpg, done.
@mabergerx everything looks good!
The pytorch hub model summary displays before layer fusing. Fusing reduces the layer and parameter count.
Hmm, interesting. I will test this with yolov5s.pt
also on my machine for sanity checking but do you have any intuition about the different inference results between predicting on multiple images versus just one using PyTorch Hub?
@mabergerx the hub model has a smart batch constructor that will merge disparately sized images into a single-sized batch with a minimum of padding required to meet the constraints of identical height-width sizes throughout the batch and height-width sizes both being multiples of the model max stride (32 typically).
This means that the padding on individual images may vary as a function of the other images in the batch.
In short, single and multi image batches may differ.
@glenn-jocher I see, thanks for the explanation. It seems like the size
parameter in the Hub call doesn't have any effect then?
Input:
results = model(imgs, size=640)
results.print()
Output (note the resolution):
image 1/1: 1088x832 4 persons, 1 bus
For another test, I tried to first resize the picture and saving it:
image_ = Image.open("data/images/bus.jpg").resize((832, 1088))
image_.save("data/images/resized_bus.jpg")
The inference results are then the following:
detect.py
Input:
! python detect.py --weights yolov5l.pt --conf 0.25 --img 1088 --source "data/images/"
Output (_note the same resolution of the bus and resizedbus, but a different prediction?):
Namespace(agnostic_nms=False, augment=False, classes=None, conf_thres=0.25, device='', exist_ok=False, img_size=1088, iou_thres=0.45, name='exp', project='runs/detect', save_conf=False, save_txt=False, source='data/images/', update=False, view_img=False, weights=['yolov5l.pt'])
YOLOv5 v4.0-93-g95aefea torch 1.7.1 CUDA:0 (Tesla P100-PCIE-16GB, 16280.875MB)
Fusing layers...
Model Summary: 392 layers, 47025981 parameters, 0 gradients, 115.4 GFLOPS
image 1/3 /home/jupyter/yolov5/data/images/bus.jpg: 1088x832 4 persons, 1 bicycle, 1 bus, 1 tie, Done. (0.048s)
image 2/3 /home/jupyter/yolov5/data/images/resized_bus.jpg: 1088x832 4 persons, 1 bicycle, 1 bus, Done. (0.047s)
image 3/3 /home/jupyter/yolov5/data/images/zidane.jpg: 640x1088 2 persons, 2 ties, Done. (0.042s)
Results saved to runs/detect/exp39
Done. (0.248s)
Input:
model = torch.hub.load('ultralytics/yolov5', 'yolov5l', pretrained=True)
imgs = [
'data/images/resized_bus.jpg', # filename
]
results = model(imgs)
results.print()
Output (note the different prediction result given the same size of image and single image batch):
image 1/1: 1088x832 4 persons, 1 bus
Any intuition on why this would be happening beyond the post- and pre-processing differences?
Thank you very much!
EDIT: We found out that the print statement from the PyTorch Hub gave consufing image resolution, after testing it with explicit size setting, we found out that the inference results are consistent.
@mabergerx size argument defines inference size for the long side of the batch.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
❔Question
Hi, first thank you for this YOLO implementation, looks great!
For an experiment, I retrained the model from the
yolov5l.pt
checkpoint by following the Train Custom Data tutorial on a relatively small amount of data and just a few classes.When I am using the CLI for inference on the example images:
! python detect.py --weights ./runs/train/exp6/weights/best.pt --img 640 --conf 0.25 --source "data/images"
a class (wrongly) gets predicted on the bus with a low probability, which is most likely just a data (lack thereof) issue.
Now, when predicting on the same two images by loading the custom model through PyTorch Hub:
or through the pypi Yolov5 package:
in both cases we see that no weapon class is getting predicted on the bus image. That is contradicting to the inference results from
detect.py
CLI. However, even more interesting is that when I do inference on just the single bus image using the above methods, I do get theweapon
class predicted!So the results differ based on the amount of images given to the model to predict. Why is that happening?
And in general, what is the difference between inference using the
detect.py
on the command line or through loading the model with PyTorch?We suspect maybe something with
BatchNormalization
?Thank you in advance for looking into it!