Closed Kieran31 closed 2 years ago
π Hello @Kieran31, thank you for your interest in YOLOv5 π! Please visit our βοΈ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.
If this is a π Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.
If this is a custom training β Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.
For business inquiries or professional support requests please visit https://ultralytics.com or email Glenn Jocher at glenn.jocher@ultralytics.com.
Python>=3.6.0 with all requirements.txt installed including PyTorch>=1.7. To get started:
$ git clone https://github.com/ultralytics/yolov5
$ cd yolov5
$ pip install -r requirements.txt
YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):
If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), validation (val.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit.
@Kieran31 see PyTorch Hub tutorial for full inference examples on trained custom models.
This example loads a pretrained YOLOv5s model from PyTorch Hub as model
and passes an image for inference. 'yolov5s'
is the lightest and fastest YOLOv5 model. For details on all available models please see the README.
import torch
# Model
model = torch.hub.load('ultralytics/yolov5', 'yolov5s')
# Image
img = 'https://ultralytics.com/images/zidane.jpg'
# Inference
results = model(img)
results.pandas().xyxy[0]
# xmin ymin xmax ymax confidence class name
# 0 749.50 43.50 1148.0 704.5 0.874023 0 person
# 1 433.50 433.50 517.5 714.5 0.687988 27 tie
# 2 114.75 195.75 1095.0 708.0 0.624512 0 person
# 3 986.00 304.00 1028.0 420.0 0.286865 27 tie
@glenn-jocher thanks.
what does size do here? does it chop a height=640, width=1280, RGB image to two 640x640 images? https://github.com/ultralytics/yolov5/blob/30e4c4f09297b67afedf8b2bcd851833ddc9dead/models/common.py#L243-L252
I have a question with the same concept.
I'm trying to convert to torchscript , using:
python export.py --weights yolov5x.pt --img 640 --batch 1 --include torchscript
and I printed the output of (line 298 export.pt):
for _ in range(2): y = model(im) print(y[0].shape)
I'm getting [1,25556,85]-> Now why is it 85?! it suppose to be 6 is it?
When using:
model = torch.hub.load('ultralytics/yolov5', 'yolov5s')
I'm getting 6 in dim=-1
Thanks!!
@Kieran31 size defines inference size (long side). Resizing and padding is handled by letterbox()
function.
@jbattab COCO models have 80 classes + 4 box + 1 objectness outputs at each anchor, and there are 25k anchors per image in your example.
@jbattab you might want to start at the beginning and read the YOLO papers, which explain everything well: https://pjreddie.com/publications/
@glenn-jocher thank u. I figured out the letterbox()
function scales down the image with a ratio so that it can be fed into the model. But will this cause the original objects being too small to be detected?
Also, line 250 indicates the input image can be a tensor.
https://github.com/ultralytics/yolov5/blob/30e4c4f09297b67afedf8b2bcd851833ddc9dead/models/common.py#L243-L252
However, the output of output = model(torch.zeros(8,3,512,512))
is not a models.common.Detections
but a tuple. this is what I raised in my first question. Could you please explain it?
- output is a tuple of length 2.
- output[0] is a tensor of size [8, 16128, 6].
- output[1] is a list of length 3.
- output[1][0] is a tensor of size [8, 3, 64, 64, 6]
- output[1][1] is a tensor of size [8, 3, 32, 32, 6]
- output[1][2] is a tensor of size [8, 3, 16, 16, 6]
@Kieran31 all pytorch models input and output torch tensors. YOLOv5 PyTorch Hub models are AutoShape() classes that wrap a pytorch model and handle inputs and outputs.
It's up to you to determine an appropriate --img-size suitable for your deployment requirements.
Let me ask it in a different way
the output of (line 298 export.pt):
for _ in range(2): y = model(im) print(y[0].shape)
is just a tuple, how can I make it 'models.common.Detections ' type?
@glenn-jocher sorry maybe I'm not clear enough.
My question is: when the input
is an image file, like data/images/zidane.jpg
, the output is models.common.Detections
. But when the input
is a tensor, like torch.zeros(8,3,512,512)
, the output is a tuple instead of models.common.Detections
.
I tried all input types. filename, URI, OpenCV, PIL, np, multiple give a models.common.Detections
except for tensor returning a tuple. Is this a bug?
https://github.com/ultralytics/yolov5/blob/30e4c4f09297b67afedf8b2bcd851833ddc9dead/models/common.py#L243-L252
@Kieran31 yes this is the default behavior. This allows AutoShape models to be used in val.py and detect.py type workflows where more traditional pytorch dataloaders are used that already preprocess the inputs (letterboxing, resizing, etc.)
@jbattab see PyTorch Hub tutorial:
@glenn-jocher Thanks for your explanation.
Say my input is a tensor, but I still want to get a models.common.Detections
so that I can do results.xyxy[0]
. Without converting tensor to numpy, while still keeping the shape BCHW, what can I do on the returned tuple on GPU?
Looks like only hiding these 3 lines doesn't work.
https://github.com/ultralytics/yolov5/blob/30e4c4f09297b67afedf8b2bcd851833ddc9dead/models/common.py#L255-L257
I don't find the solution in the PyTorch Hub tutorial. If there is one, I appreciate you pointing it to me.
@Kieran31 torch inputs create torch outputs because in a traditional torch workflow the dataloader has already padded and collated all images into a batch, and the batch itself does not supply sufficient information to invert these letterboxing operations.
Basically you would be attempting to run postprocessing without running preprocessing, which is impossible because postprocessing depends on info generated by preprocessing.
@glenn-jocher
I don't quite understand what you mean. do you mean if I use the create_dataloader
in 'utils.datasets' rather than a traditional dataloader, then I can invert?
Because I found
https://github.com/ultralytics/yolov5/blob/a4fece8c1480aed46a38a6344b403d79c81bd751/val.py#L173-L185 https://github.com/ultralytics/yolov5/blob/a4fece8c1480aed46a38a6344b403d79c81bd751/detect.py#L149-L151 https://github.com/ultralytics/yolov5/blob/a4fece8c1480aed46a38a6344b403d79c81bd751/detect.py#L182-L183
As shown in Line 174 val.py and Line 151 detect.py, the model output
of a torch tensor is a tuple. output[0]
is for NMS, output[1]
is for loss calculation. So if I want to restore the predicted xywh, I just need to pass the whole output[0]
to non_max_suppression
?
Also, output[0]
is a tensor of shape [8, 16128, 6]. The first dimension 8 is the batch size. The third dimension I've figured out is [x, y, x, y, confidence, class]. What is the second dimension? Is it how many objects detected before NMS?
Any update on this? I was trying to compile YOLOv5 with Neuron (https://github.com/aws/aws-neuron-sdk/issues/253) but the compiled model returns output similar to you. I try to turn the model do an AutoShape object (https://github.com/ultralytics/yolov5/blob/30e4c4f09297b67afedf8b2bcd851833ddc9dead/models/common.py#L243-L252) but still get the same output. Is there any way to process this output, or any way to use AutoShape for the neuron compiled model?
output is a tuple of length 2. output[0] is a tensor of size [8, 16128, 6]. output[1] is a list of length 3. output[1][0] is a tensor of size [8, 3, 64, 64, 6] output[1][1] is a tensor of size [8, 3, 32, 32, 6] output[1][2] is a tensor of size [8, 3, 16, 16, 6]
@minhtcai in general we don't apply AutoShape to any export format. We worked with the AWS Inferentia team to ensure YOLOv5 compatibility in https://github.com/ultralytics/yolov5/pull/2953 but I haven't actually used it myself so I can't provide much info here.
@minhtcai I don't know about Neuron but what I did for the output is
output[0]
output[1]
is for loss calculation.
https://github.com/ultralytics/yolov5/blob/8df64a912274ea3a82df2f96f0e3c3ab95713502/val.py#L177-L183
π Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.
Access additional YOLOv5 π resources:
Access additional Ultralytics β‘ resources:
Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!
Thank you for your contributions to YOLOv5 π and Vision AI β!
Hi, I hope that it is not too late to ask a further question about this topic. Which function in the repository converts the output tensors: torch.Size([1, 3, 48, 80, 85]) , torch.Size([1, 3, 24, 40, 85]) , torch.Size([1, 3, 12, 20, 85]) , to: torch.Size([1, 15120, 85])
@hamedmh not sure I follow. as far as I know. these are two parallel things. https://github.com/ultralytics/yolov5/blob/8df64a912274ea3a82df2f96f0e3c3ab95713502/val.py#L177-L183
out, train_out = model(im)
out is torch.Size([1, 15120, 85]) train_out is a list of 3 torches of size torch.Size([1, 3, 48, 80, 85]) , torch.Size([1, 3, 24, 40, 85]) , torch.Size([1, 3, 12, 20, 85])
@Kieran31 Thank you for the answer. I noticed that the total number of elements of the three train_out tensors is the same as out. Are they representing the same information of the same bounding boxes? How or for what purpose do we use the three train_out tensors?
@hamedmh
for your second question, what I only know is that train_out
is used for computing loss. see the quotes below
https://github.com/ultralytics/yolov5/blob/8df64a912274ea3a82df2f96f0e3c3ab95713502/val.py#L177-L183
for the first one, I don't know. I'm not a Ultralytics member, and this issue has been closed, so I'm not sure if the ultralytics team can received your questions here. My suggestion would be to open a new issue or/and reading the yolov5 paper.
@Kieran31 Thank you for the explanation.
What about an exported model?
I finetuned yolov5s and exported it for mobile (torchscript). How to use the model on iOS device if I don't have access to all the utility methods for image preprocessing?
Hi @mladen-korunoski ,
Actually the main ops used in the pre-processing is interpolation and pad, and torch provided these two ops, so I guess you can just use torchscript to implement the pre-processing, check the following as an example.
https://github.com/zhiqwang/yolov5-rt-stack/blob/d2db932/yolort/models/transform.py#L255-L307
Hi!
I was able to convert the model from yolov5 to neuron with the follow code:
import torch
import torch_neuron
from torchvision import models
model = torch.hub.load('yolo5',
'custom',
path='yolov5.pt',
source='local',
force_reload=True) # local repo
fake_image = torch.zeros([1, 3, 640, 640], dtype=torch.float32)
#fake_image = (torch.rand(3), torch.rand(3))
try:
torch.neuron.analyze_model(model, example_inputs=[fake_image])
except Exception:
torch.neuron.analyze_model(model, example_inputs=[fake_image])
model_neuron = torch.neuron.trace(model,
example_inputs=[fake_image])
## Export to saved model
model_neuron.save("model_converted.pt")
Now that I am trying to test and compare I have the tensors outputs different from yolo as follow:
Neuron Yolov5 Model:
[tensor([[-0.0356, 0.1790, 0.7456, 0.6292, 0.9359, 13.0000],
[ 0.5830, 0.1404, 1.1279, 0.6628, 0.9359, 13.0000],
[ 0.0823, 0.6350, 0.6272, 1.1599, 0.9315, 13.0000],
[-0.1443, 0.1416, 0.2542, 0.5107, 0.9224, 13.0000],
[ 0.3516, 0.6426, 0.7500, 1.0137, 0.9188, 13.0000],
[ 0.3555, 0.1436, 0.7539, 0.5127, 0.9147, 13.0000]])]
Yolov5 (this one):
[tensor([[334.57495, 176.98302, 407.46155, 213.81169, 0.93721, 13.00000]])]
Inference script:
im = cv2.imread('test_img.jpg')
img0 = im.copy()
im = cv2.resize(im, (640, 640), interpolation = cv2.INTER_AREA)
# Convert
im = im.transpose((2, 0, 1))[::-1] # HWC to CHW, BGR to RGB
im = np.ascontiguousarray(im)
# Convert into torch
im = torch.from_numpy(im)
im = im.float() # uint8 to fp16/32
im /= 255 # 0 - 255 to 0.0 - 1.0
if len(im.shape) == 3:
im = im[None] # expand for batch dim
# Load the compiled model
model = torch.jit.load('model_converted.pt')
# Inference
pred = model(im)
pred = non_max_suppression(pred) #nms function used same as yolov5 detect.py
#Process predictions
for i, det in enumerate(pred): # per image
im0 = img0.copy()
color=(30, 30, 30)
txt_color=(255, 255, 255)
h_size, w_size = im.shape[-2:]
print(h_size, w_size)
lw = max(round(sum(im.shape) / 2 * 0.003), 2)
if len(det):
# Write results
for *xyxy, conf, cls in reversed(det):
c = int(cls) # integer class
label = f'{CLASSES[c]} {conf:.2f}'
print(label)
box = xyxy
p1, p2 = (int(box[0]* w_size), int(box[1]* h_size)), (int(box[2]* w_size), int(box[3]* h_size))
cv2.rectangle(im0, p1, p2, color, thickness=lw, lineType=cv2.LINE_AA)
tf = max(lw - 1, 1) # font thickness
w, h = cv2.getTextSize(label, 0, fontScale=lw / 3, thickness=tf)[0] # text width, height
outside = p1[1] - h - 3 >= 0 # label fits outside box
p2 = p1[0] + w, p1[1] - h - 3 if outside else p1[1] + h + 3
cv2.rectangle(im0, p1, p2, color, -1, cv2.LINE_AA) # filled
cv2.putText(im0,
label, (p1[0], p1[1] - 2 if outside else p1[1] + h + 2),
0,
lw / 3,
txt_color,
thickness=tf,
lineType=cv2.LINE_AA)
# Save results (image with detections)
status = cv2.imwrite('out.jpg', im0)
Is there something wrong when converting the model or running inference? The label and also the acc seems to be same as the expected, but tensors not.
I follow @jluntamazon pull but I not able to see difference. # https://github.com/ultralytics/yolov5/pull/2953
@hamedmh for your second question, what I only know is that
train_out
is used for computing loss. see the quotes belowhttps://github.com/ultralytics/yolov5/blob/8df64a912274ea3a82df2f96f0e3c3ab95713502/val.py#L177-L183
for the first one, I don't know. I'm not a Ultralytics member, and this issue has been closed, so I'm not sure if the ultralytics team can received your questions here. My suggestion would be to open a new issue or/and reading the yolov5 paper.
@Kieran31 Hi, It might be late for question but I have some to ask you. I am new in object detection. I'm still confused with computing loss from train_out. I recognized that model can predict many boxes of objects before pass to NMS. and train_out is not passed to NMS yet. how model know which boxes is compared with Target Box??
@Kieran31 size defines inference size (long side). Resizing and padding is handled by
letterbox()
function.@jbattab COCO models have 80 classes + 4 box + 1 objectness outputs at each anchor, and there are 25k anchors per image in your example.
Can you please explain how can I get the class of the objects present in each anchor?
Hey team, I have a question for the output shape of my model. After the training process for expiry date detection with yolov5, I got an output like this:
index | xmin | ymin | xmax | ymax | confidence | class | name | path 0 | 351.337006 | 470.231140 | 435.794891 | 484.624939 | 0.527743 | 1.0 | exp-date | 0 | 138.336823 | 383.291962 | 233.642303 | 407.610565 | 0.511508 | 1.0 | exp-date | Β
And I converted the .pt file for usage in CoreML. And it say that the YOLOv5 gives an output shape (1, 25200, 8). The input image size is 640x640. How could I understand the output? If I print the first 8 elements of the output array, it shows me like this: Float32 1 Γ 25200 Γ 8 array (prediction) 8.929688 (prediction[0]) 7.6875 (prediction[1]) 16.625 (prediction[2]) 18.79688 (prediction[3]) 0 (prediction[4]) 0.01025391 (prediction[5]) 0.9780273 (prediction[6]) 0.01074219 (prediction[7])
Could someone give an explanation? Thanks
@doppelvincent hi there! The model's output tensor shape is [1, 25200, 8], representing the predicted bounding boxes and their attributes. It contains 25200 entries that correspond to bounding box predictions. Each prediction is composed of 8 values: [x_center, y_center, width, height, objectness, class_0_confidence, class_1_confidence, class_2_confidence]. In your output, the values seem to be in the correct order and format.
These values represent the predicted bounding box attributes, such as its center coordinates, width, height, objectness score, and class confidences. You can extract and interpret these values for each bounding box to understand the model's predictions.
If you need further assistance in interpreting the output or in integrating it into CoreML, feel free to ask. Good luck with your expiry date detection project!
@glenn-jocher hello! When I use the yolov5-7.0 before add the --train in export.py, the first export onnx is: python export.py --train --simplify
, I get three outputs:
the second export onnx is: python export.py --simplify
, I get one output:
And now, I use the onnx file to test the accuracy by val.py. The fisrt onnx get error: (NN) D:\python_work\yolov5>python val.py --device 0 --name train_mode --dnn
val: data=E:\downloads\compress\datasets\train_data\train_data.yaml, weights=runs\train\WI_PRW_SSW_SSM_20231127\weights\best_train.onnx, batch_size=16, imgsz=640, conf_thres=0.001, iou_thres=0.6, max_det=300, task=val, device=0, workers=0, single_cls=False, augment=False, verbose=False, save_txt=False, save
_hybrid=False, save_conf=False, save_json=False, project=runs\val, name=train_mode, exist_ok=False, half=False, dnn=True
YOLOv5 v7.0-240-g84ec8b5 Python-3.8.18 torch-1.9.1+cu111 CUDA:0 (GeForce RTX 2060, 6144MiB)
Loading runs\train\WI_PRW_SSW_SSM_20231127\weights\best_train.onnx for ONNX OpenCV DNN inference...
Forcing --batch-size 1 square inference (1,3,640,640) for non-PyTorch models
val: Scanning E:\downloads\compress\datasets\train_data\labels\val.cache... 2575 images, 0 backgrounds, 0 corrupt: 100%|ββββββββββ| 2575/2575 [00:00<?, ?it/s]
Class Images Instances P R mAP50 mAP50-95: 0%| | 1/2575 [00:00<21:52, 1.96it/s]Exception in thread Thread-3:
Traceback (most recent call last):
self._target(*self._args, **self._kwargs)
File "D:\python_work\yolov5\utils\plots.py", line 175, in plot_images
annotator.box_label(box, label, color=color)
File "D:\Anaconda3\envs\NN\lib\site-packages\ultralytics\utils\plotting.py", line 108, in box_label
self.draw.rectangle(box, width=self.lw, outline=color) # box
File "D:\Anaconda3\envs\NN\lib\site-packages\PIL\ImageDraw.py", line 294, in rectangle
self.draw.draw_rectangle(xy, ink, 0, width)
ValueError: x1 must be greater than or equal to x0
Class Images Instances P R mAP50 mAP50-95: 0%| | 3/2575 [00:01<18:57, 2.26it/s]Exception in thread Thread-7:
Traceback (most recent call last):
File "D:\Anaconda3\envs\NN\lib\threading.py", line 932, in _bootstrap_inner
self.run()
File "D:\Anaconda3\envs\NN\lib\threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "D:\python_work\yolov5\utils\plots.py", line 175, in plot_images
annotator.box_label(box, label, color=color)
File "D:\Anaconda3\envs\NN\lib\site-packages\ultralytics\utils\plotting.py", line 108, in box_label
self.draw.rectangle(box, width=self.lw, outline=color) # box
File "D:\Anaconda3\envs\NN\lib\site-packages\PIL\ImageDraw.py", line 294, in rectangle
**self.draw.draw_rectangle(xy, ink, 0, width)
ValueError: x1 must be greater than or equal to x0**
Class Images Instances P R mAP50 mAP50-95: 100%|ββββββββββ| 2575/2575 [12:33<00:00, 3.42it/s]
all 2575 30443 0 0 0 0
Speed: 0.4ms pre-process, 272.5ms inference, 0.9ms NMS per image at shape (1, 3, 640, 640)
Results saved to runs\val\train_mode
the sencond onnx can get success: `val: data=E:\downloads\compress\datasets\train_data\train_data.yaml, weights=runs\train\WI_PRW_SSW_SSM_20231127\weights\best.onnx, batch_size=16, imgsz=640, conf_thres=0.001, iou_thres=0.6, max_det=300, task=val, device=0, workers=0, single_cls=False, augment=False, verbose=False, save_txt=False, save_hybri d=False, save_conf=False, save_json=False, project=runs\val, name=train_mode, exist_ok=False, half=False, dnn=False YOLOv5 v7.0-240-g84ec8b5 Python-3.8.18 torch-1.9.1+cu111 CUDA:0 (GeForce RTX 2060, 6144MiB)
Loading runs\train\WI_PRW_SSW_SSM_20231127\weights\best.onnx for ONNX Runtime inference... Forcing --batch-size 1 square inference (1,3,640,640) for non-PyTorch models val: Scanning E:\downloads\compress\datasets\train_data\labels\val.cache... 2575 images, 0 backgrounds, 0 corrupt: 100%|ββββββββββ| 2575/2575 [00:00<?, ?it/s] Class Images Instances P R mAP50 mAP50-95: 100%|ββββββββββ| 2575/2575 [01:30<00:00, 28.57it/s] all 2575 30443 0.807 0.719 0.771 0.51 face 2575 6954 0.835 0.687 0.743 0.352 person 2575 19192 0.814 0.769 0.795 0.471 car 2575 4012 0.868 0.833 0.888 0.671 bus 2575 187 0.799 0.791 0.835 0.616 truck 2575 98 0.717 0.517 0.597 0.439 Speed: 0.4ms pre-process, 12.3ms inference, 0.9ms NMS per image at shape (1, 3, 640, 640) Results saved to runs\val\train_mode2 ` How to solve it?
@glenn-jocher I also get the same question when use the yolov5-6.2. And how can get one output by reshape and concat from three outputs in the first export onnx, it is the pt file: best.zip
@dengxiongshi Thanks for reaching out. It looks like you're encountering issues with using the ONNX model for validation after the export process. To best troubleshoot this, I recommend the following steps:
Verify the correct export procedure: Ensure that the ONNX export process is performed correctly with the necessary flags, including --train
when applicable, and that the environment and dependencies are set up properly.
Model consistency: Check that the versions and configurations of the YOLOv5 code, export script, and third-party libraries used for conversion are consistent and compatible.
Inference environment: Confirm that the ONNX runtime and related dependencies, used during validation, are properly set up and have compatibility with the exported model.
Regarding your query about reshaping and concatenating the three outputs from the first export into a single output, the process may involve reshaping the outputs to ensure compatibility and then concatenating them along the appropriate dimension.
If the issue persists, I recommend posting your detailed question on the YOLOv5 GitHub repository: https://github.com/ultralytics/yolov5. The community and the Ultralytics team will be better equipped to assist with debugging and resolving the issues you're facing.
Let me know if I can help you further with any of these steps!
Hi @glenn-jocher. For the three steps, I have checked the corresponding environment and dependencies and there is no problem. I submitted an issue in here.
@dengxiongshi great to hear that you've checked the environment and dependencies thoroughly. I see that you've also raised an issue on the YOLOv5 GitHub repository. Our team will assist you there to address the ONNX export and validation concerns effectively. Feel free to reach out if you have any further queries or need additional assistance. Good luck with resolving the issue!
I have a question with the same concept. I'm trying to convert to torchscript , using:
python export.py --weights yolov5x.pt --img 640 --batch 1 --include torchscript
and I printed the output of (line 298 export.pt):
for _ in range(2): y = model(im) print(y[0].shape)
I'm getting [1,25556,85]-> Now why is it 85?! it suppose to be 6 is it? When using:model = torch.hub.load('ultralytics/yolov5', 'yolov5s')
I'm getting 6 in dim=-1 Thanks!!
I find out that 85 meaningself.reg_max*4 + self.nc !
it is not from MSCOCO.
https://github.com/ultralytics/ultralytics/blob/2f11ab5e6f26885640e9ff6b9ebec165c3bf82b3/ultralytics/utils/loss.py#L197
In my case, i set 21 class for my custom dataset.
I wonder if i understood correctly. @glenn-jocher
@mandal4 hello! It looks like you've figured out the output dimensions correctly. The 85 in the output tensor [1, 25556, 85] corresponds to the number of classes plus the bounding box coordinates and the objectness score for each prediction. In YOLOv5, the output tensor typically has the shape [batch_size, number_of_anchors, 4 + 1 + number_of_classes], where:
number_of_classes
is the number of classes the model is trained to detect.In your case, with 21 classes, the output would be 4 (bbox) + 1 (objectness) + 21 (classes) = 26. However, you're seeing 85 because self.reg_max * 4 + self.nc
indicates that there might be additional logic applied to the bounding box coordinates, possibly related to anchor scaling or other model-specific details.
If you have any further questions or need clarification, feel free to ask. Good job on diving into the code to understand the model's output! π
@glenn-jocher hello! I agree with your answer about the output tensor [1, 25556,85], but I still have some question. As you said, the last 80 is the probability of every classes. But I find that sum([ 5: ]) !=1
. I wonder why this happen? how do you get the logits of these, Softmax()
or others?
@Kegard hello again! The values you're seeing in the output tensor are raw logits, not probabilities. They do not sum to 1 because they have not been passed through a softmax function. In YOLOv5, during inference, these logits are typically passed through a sigmoid function to convert them to objectness scores and class confidences, which are separate from each other.
The objectness score indicates the likelihood that the bounding box contains any object, while the class confidences represent the likelihood of each class being present in the bounding box. These confidences are not mutually exclusive and are not meant to sum to 1 across all classes. Instead, each class confidence is independent and represents the model's confidence that a particular class is detected within the bounding box.
If you want to convert the raw logits to probabilities that sum to 1 for the class predictions, you would apply a softmax function to the class logits. However, this is not the standard practice for YOLO models, as they treat object detection as a multi-label classification problem, where each bounding box can potentially belong to multiple classes with independent probabilities.
I hope this clarifies your question! If you need further assistance, feel free to ask.
Hi @glenn-jocher and everyone,
I'm trying to deal with an exported yolov5n.tflite and inference servers. The output I receive from processing an image has a shape of [1, 25200, 85]. This is a sample of the output: [[[ 2 3 7 ... 0 0 0] [ 2 3 7 ... 0 0 0] [ 2 3 7 ... 0 0 0] ... [230 232 19 ... 2 0 0] [228 233 26 ... 2 0 0] [228 232 47 ... 2 0 1]]]
About the dimensions, I already understood that:
I am reading all the post related to this topic, but I'm still not being able to manage that output to convert it into an output which I can use and understand, similar to the output obtained when just using yolov8 through the ultralytics module (This may be due to my lack of knowledge since I am just beginning on this topics).
As I say, I'm still reading all the previous information related to that, but any help about what steps should I follow would be appreciated.
Thank you in advance!
Hi there! π
It sounds like you're on the right track with understanding the output of your YOLOv5n.tflite model. To make sense of these outputs and convert them into a more usable form (bounding boxes, class IDs, and scores), you'll typically need to apply some post-processing steps. Hereβs a brief overview:
Apply a sigmoid function to the objectness
scores and class predictions to convert logits to probabilities.
Filter out predictions with objectness scores below a certain threshold to reduce the number of detections, as many will be low confidence.
Apply Non-Max Suppression (NMS): Since your model may predict multiple overlapping boxes for a single object, NMS helps in selecting the most probable bounding box while discarding the rest.
In pseudo-code, your process might look something like this:
# Assuming outputs is your model output with shape [1, 25200, 85]
# Sigmoid the objectness score and class predictions
outputs[..., :4] = torch.sigmoid(outputs[..., :4]) # Adjust bounding boxes
outputs[..., 4:] = torch.sigmoid(outputs[..., 4:]) # Objectness and class preds
# Apply a threshold to filter out low-confidence predictions
conf_threshold = 0.25
mask = outputs[..., 4] > conf_threshold
outputs = outputs[mask]
# Apply NMS
nms_threshold = 0.45
boxes, scores, classes = nms(outputs, nms_threshold)
# boxes, scores, and classes are your final, usable outputs
Keep in mind, the nms
function and thresholds used here are just placeholders. You'll need to adapt this pseudo-code to fit your exact needs and also implement or use an existing NMS function suitable for your framework (TensorFlow, PyTorch, etc.).
This process should help you glean more actionable insights from your model's predictions. Keep experimenting and studying; you're doing great so far!
Feel free to ask if you have more questions. Happy coding!
Hi @glenn-jocher ,
Thank you so much for your response. I've trying to apply the steps you proposed, but I'm still obtaining an output with no sense. After applying the Sigmoid, most of the objectness values are almost 1.0, what makes no sense. I'm thinking maybe it is related to the fact that the output of the net is quantized, so I probably should dequantize it before (I'm working with an inference server running on an embedded arm64 system).
Does it makes sense? Do you know how could I face the problem to dequantize the output of the net previous to applying the sigmoid?
Thank you!
Hi there!
Yes, it absolutely makes sense that if you're working with a quantized model, the outputs could be in a quantized format. Before applying sigmoid functions or any further processing, you would indeed need to dequantize these outputs to floating-point values, which can significantly affect your post-processing steps.
The approach to dequantize depends on the framework you're using. Generally, if the model was quantized using TensorFlow Lite, the .tflite
model file often contains the scale and zero-point for each tensor, which you can use to convert the quantized values back to floating-point numbers.
Here's a simplified example in Python for dequantization:
def dequantize(quantized_value, scale, zero_point):
# Convert quantized value to a floating point
return scale * (quantized_value - zero_point)
You'd need to apply this function to your model outputs using the appropriate scale
and zero_point
values for each output tensor before proceeding with sigmoid or other post-processing steps.
Keep in mind, the details may vary depending on your precise setup and framework. If you're using a different environment or library, they might provide built-in methods to handle dequantization more seamlessly.
Hope this helps you move forward! Let me know if you have any more questions. Happy coding! π
When I convert the YOLO v8 model weights to int16 and validate, I'm getting 0 accuracy, but with float32 model weights, I'm getting 0.87 accuracy.
When I convert the YOLO v8 model weights to int16 and validate, I'm getting 0 accuracy, but with float32 model weights, I'm getting 0.87 accuracy.
When I convert the YOLO v8 model weights to int16 and validate, I'm getting 0 accuracy, but with float32 model weights, I'm getting 0.87 accuracy.
Hello @madasuvenky,
Thank you for reaching out and providing details about your issue. It sounds like you're experiencing a significant drop in accuracy when converting your YOLOv8 model weights to int16. This is indeed unusual and suggests there might be an issue with the quantization process.
To help us investigate further, could you please provide a minimum reproducible code example? This will allow us to better understand the steps you're taking and identify any potential issues. You can find guidelines on creating a minimum reproducible example here. Ensuring we can reproduce the bug is crucial for us to provide an effective solution.
Additionally, please make sure you are using the latest versions of torch
and the YOLOv5 repository. Sometimes, updates can resolve unexpected issues.
Quantization can be tricky, especially when dealing with different data types. If you haven't already, you might want to check the scale and zero-point values used during the quantization process, as incorrect values can lead to significant accuracy drops.
Here's a brief example of how you might dequantize your model outputs if you're using TensorFlow Lite:
def dequantize(quantized_value, scale, zero_point):
return scale * (quantized_value - zero_point)
# Example usage
quantized_output = ... # Your quantized model output
scale = ... # Scale factor from your model
zero_point = ... # Zero point from your model
dequantized_output = dequantize(quantized_output, scale, zero_point)
Feel free to share more details or any specific error messages you're encountering. We're here to help!
βQuestion
Hi team, I trained the model on 512x512 images. Now I want to do detection on a huge image, for example 5000x5000. So I chopped the huge image to 512x512 images with a tiler and created a dataloader with batch size = 8.
Say my
input_batch
is of shape [8, 3, 512, 512]Now I have difficulty understanding the model output. Can someone help me interpret these?
output
is a tuple of length 2.output[0]
is a tensor of size [8, 16128, 6].output[1]
is a list of length 3.output[1][0]
is a tensor of size [8, 3, 64, 64, 6]output[1][1]
is a tensor of size [8, 3, 32, 32, 6]output[1][2]
is a tensor of size [8, 3, 16, 16, 6]Additional context
I didn't find a tool for integrating multiple image detection results. If this repo does have one, please tell me. Thanks very much.