How to add a new detection layer ?

lzy-a commented 2 years ago

Search before asking

[X] I have searched the YOLOv5 issues and discussions and found no similar questions.

Question

I would like to add a new detection layer from 3 to 4 in order to enhance the detection of smaller objects.

[[23, 26, 29, 32], 1, Detect, [nc, anchors]], # Detect(p2,P3, P4, P5)

This is the last layer of my design. 👆

Do I need to change the setting of the loss function self.balance = {3: [4.0, 1.0, 0.4]}？ Maybe {4:[1.0,4.0,1.0,0.4]}

And if I do not change it, what will happen?

Additional

No response

github-actions[bot] commented 2 years ago

👋 Hello @lzy-a, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.

For business inquiries or professional support requests please visit https://ultralytics.com or email support@ultralytics.com.

Requirements

Python>=3.7.0 with all requirements.txt installed including PyTorch>=1.7. To get started:

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Google Colab and Kaggle notebooks with free GPU:
Google Cloud Deep Learning VM. See GCP Quickstart Guide
Amazon Deep Learning AMI. See AWS Quickstart Guide
Docker Image. See Docker Quickstart Guide

Status

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), validation (val.py), inference (detect.py) and export (export.py) on macOS, Windows, and Ubuntu every 24 hours and on every commit.

glenn-jocher commented 2 years ago

@lzy-a P2 model for small object detection is available here. No modifications are necessary to use it: https://github.com/ultralytics/yolov5/blob/master/models/hub/yolov5-p2.yaml

lzy-a commented 2 years ago

@lzy-a P2 model for small object detection is available here. No modifications are necessary to use it: https://github.com/ultralytics/yolov5/blob/master/models/hub/yolov5-p2.yaml

Thank you very much.

It looks like the only difference between me and it is that I have added two SE modules.

The effect of adding 2 SE modules alone is improved, but the effect of then adding a new detection layer becomes worse.

I have to think about it.

glenn-jocher commented 2 years ago

@lzy-a hmm that's interesting. If the two SE modules independently improve the model then maybe that's an improvement by itself. Do you have a fork showing the SE results?

Note that P2 improvements mainly help very small objects, but introduce higher inference time.

lzy-a commented 2 years ago

@lzy-a hmm that's interesting. If the two SE modules independently improve the model then maybe that's an improvement by itself. Do you have a fork showing the SE results?

Note that P2 improvements mainly help very small objects, but introduce higher inference time.

I‘ll try to make a fork to show it tomorrow.

glenn-jocher commented 2 years ago

@lzy-a great!

lzy-a commented 2 years ago

@lzy-a hmm that's interesting. If the two SE modules independently improve the model then maybe that's an improvement by itself. Do you have a fork showing the SE results?

Note that P2 improvements mainly help very small objects, but introduce higher inference time. I get it. [-1, 1, C3, [128, False]], # 21 (P2/4-x-small) I set the number of this layer to 3 screwed everything up.

mx2013713828 commented 2 years ago

@lzy-a hmm that's interesting. If the two SE modules independently improve the model then maybe that's an improvement by itself. Do you have a fork showing the SE results? Note that P2 improvements mainly help very small objects, but introduce higher inference time. I get it. [-1, 1, C3, [128, False]], # 21 (P2/4-x-small) I set the number of this layer to 3 screwed everything up.

Can you share your yaml file for small object detection？I would be very grateful~

glenn-jocher commented 2 years ago

@mx2013713828 the -p2 models are designed for very small object detection: https://github.com/ultralytics/yolov5/blob/master/models/hub/yolov5-p2.yaml

github-actions[bot] commented 2 years ago

👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.

Access additional YOLOv5 🚀 resources:

Wiki – https://github.com/ultralytics/yolov5/wiki
Tutorials – https://docs.ultralytics.com/yolov5
Docs – https://docs.ultralytics.com

Access additional Ultralytics ⚡ resources:

Ultralytics HUB – https://ultralytics.com/hub
Vision API – https://ultralytics.com/yolov5
About Us – https://ultralytics.com/about
Join Our Team – https://ultralytics.com/work
Contact Us – https://ultralytics.com/contact

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐!

RANA-ATI commented 1 year ago

Can anyone help me, I have extracted the feature from last conv2d layer

Conv2d(512, 21, kernel_size=(1, 1), stride=(1, 1))

and applied some things to it now I want to replace this updated tensor with the present one and want to complete the regular yolov5 prediction including bounding boxes and classes. One more thing we can do is somehow pass the features from Conv2d after manipulation to the next layer and remaining layers till detect layer and then get predictions. Also I have tried using the below code but now don't know if that will work as needed and what would be the way to predict.

`model_C = torch.hub.load('yolov5', 'custom', source='local', path='best.pt', force_reload=True)

def replace_spp_output(module, input, output): global average_tensor return average_tensor

spp_layer_C = next(filter(lambda x: isinstance(x, nn.Conv2d), model_C.modules())) hook_C = spp_layer_C.register_forward_hook(replace_spp_output)

img = Image.open('ra169_2.jpg') input_tensor = transforms.ToTensor()(img).unsqueeze(0) with torch.no_grad(): output_C = model_C(input_tensor)

hook_C.remove()`

glenn-jocher commented 1 year ago

@RANA-ATI hi there! It looks like you're trying to modify the feature maps from the last convolutional layer of YOLOv5 and replace it with updated tensors.

One way to pass these manipulated feature maps to the next layers until the detect layer would be to modify the forward method of the detector itself. Once you have made your changes to the feature maps, you can use them as input and continue with the regular forward method until predictions are made.

Regarding the code that you have tried, it looks like you're loading a custom YOLOv5 model using torch.hub.load, and you're trying to modify the feature maps from the SPP layer. However, it's not clear what replace_spp_output does and how it's supposed to work.

It would be helpful if you could clarify what you're trying to achieve and provide more details on the changes you've made to the feature maps, as well as the code you're using to modify them.

Let me know if you have any further questions or concerns.

RANA-ATI commented 1 year ago

@RANA-ATI hi there! It looks like you're trying to modify the feature maps from the last convolutional layer of YOLOv5 and replace it with updated tensors.

One way to pass these manipulated feature maps to the next layers until the detect layer would be to modify the forward method of the detector itself. Once you have made your changes to the feature maps, you can use them as input and continue with the regular forward method until predictions are made.

Regarding the code that you have tried, it looks like you're loading a custom YOLOv5 model using torch.hub.load, and you're trying to modify the feature maps from the SPP layer. However, it's not clear what replace_spp_output does and how it's supposed to work.

It would be helpful if you could clarify what you're trying to achieve and provide more details on the changes you've made to the feature maps, as well as the code you're using to modify them.

Let me know if you have any further questions or concerns.

Thank you for the reply. So, basically I want to do feature fusion at this layer Conv2d(512, 21, kernel_size=(1, 1), stride=(1, 1)) my aim is that I have two yolov5s and two different images when I pass to individual models they give me features from the last convd layer thanks to your technique: last_conv_layer_2 = next(reversed(list(filter(lambda x: isinstance(x, nn.Conv2d), model_A2.modules()))))

Now I want to concatenate both last_conv_layers of both models and then want to use that specific feature and want the model to now run using these features till end to give me predictions.

Hope I made it clear this time :D

It would be great if you help me with this as I am stuck in this and I have presentation at my University

glenn-jocher commented 1 year ago

@RANA-ATI Thank you for the clarification. What you're trying to do is feature fusion by concatenating the feature maps from the last conv2d layers of two different YOLOv5 models.

You can achieve this by extracting the last conv2d layer feature maps from both models, concatenating them along the channel dimension (dimension 1), and then passing the concatenated tensor through the rest of the detector.

Here's an example code snippet:

import torch
import torch.nn.functional as F
from PIL import Image

# load the two models
model1 = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)
model2 = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)

# extract the last conv2d layer features
features1 = next(reversed(list(filter(lambda x: isinstance(x, torch.nn.Conv2d), model1.modules()))))
features2 = next(reversed(list(filter(lambda x: isinstance(x, torch.nn.Conv2d), model2.modules()))))

# load the image and pass it through both models
img = Image.open('path/to/image.jpg')
with torch.no_grad():
    output1 = model1(img)
    output2 = model2(img)

# concatenate the features and pass the tensor through the rest of the detector
concat_features = torch.cat((features1.weight, features2.weight), dim=1)
x = F.relu(concat_features)
for module in model1.model:
    x = module(x)

# get the predictions
pred = model1.model[-1](x)

Note that this is just an example and you may need to adapt it to your specific use case.

I hope this helps and let me know if you have any further questions or concerns.

RANA-ATI commented 1 year ago

@RANA-ATI Thank you for the clarification. What you're trying to do is feature fusion by concatenating the feature maps from the last conv2d layers of two different YOLOv5 models.

You can achieve this by extracting the last conv2d layer feature maps from both models, concatenating them along the channel dimension (dimension 1), and then passing the concatenated tensor through the rest of the detector.

Here's an example code snippet:
import torch
import torch.nn.functional as F
from PIL import Image

# load the two models
model1 = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)
model2 = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)

# extract the last conv2d layer features
features1 = next(reversed(list(filter(lambda x: isinstance(x, torch.nn.Conv2d), model1.modules()))))
features2 = next(reversed(list(filter(lambda x: isinstance(x, torch.nn.Conv2d), model2.modules()))))

# load the image and pass it through both models
img = Image.open('path/to/image.jpg')
with torch.no_grad():
    output1 = model1(img)
    output2 = model2(img)

# concatenate the features and pass the tensor through the rest of the detector
concat_features = torch.cat((features1.weight, features2.weight), dim=1)
x = F.relu(concat_features)
for module in model1.model:
    x = module(x)

# get the predictions
pred = model1.model[-1](x)
Note that this is just an example and you may need to adapt it to your specific use case.

I hope this helps and let me know if you have any further questions or concerns.

Thanks again mate

this seems to make sense but got a little issue with this line as it gave me errors. Tried using updated repo but still got the error.

Error on this line for module in model1.model:

TypeError Traceback (most recent call last) Cell In[16], line 22 20 concat_features = torch.cat((features1.weight, features2.weight), dim=1) 21 x = F.relu(concat_features) ---> 22 for module in model1.model: 23 x = module(x) 25 # get the predictions

TypeError: 'DetectMultiBackend' object is not iterable

glenn-jocher commented 1 year ago

Hi @RANA-ATI ,

Sorry about that. It looks like you're trying to iterate over the model1.model object, which is of type DetectMultiBackend and not iterable.

Instead of iterating over model1.model, you can modify the forward method of the detector to concatenate the feature maps and pass the concatenated tensor through the rest of the detector. Here's an example:

import torch
import torch.nn.functional as F
from PIL import Image

# load the two models
model1 = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)
model2 = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)

# extract the last conv2d layer features
features1 = next(reversed(list(filter(lambda x: isinstance(x, torch.nn.Conv2d), model1.modules()))))
features2 = next(reversed(list(filter(lambda x: isinstance(x, torch.nn.Conv2d), model2.modules()))))

# concatenate the features and pass the tensor through the rest of the detector
class MyDetector(torch.nn.Module):
    def __init__(self, backbone):
        super().__init__()
        self.backbone = backbone
        concat_features = torch.cat((features1.weight, features2.weight), dim=1)
        self.conv = torch.nn.Conv2d(concat_features.shape[1], concat_features.shape[0], kernel_size=1)
        self.relu = torch.nn.ReLU(inplace=True)

    def forward(self, x):
        x = self.backbone(x)
        x = self.conv(x[-1]) + self.relu(x[-2])
        x = F.interpolate(x, scale_factor=2)
        x = self.backbone.fpn(x)
        x = self.backbone.forward(x)
        x = self.backbone.tip(x)
        x = self.backbone.forward(x)
        x = self.backbone.head(x)
        return x

model_fused = MyDetector(model1.model)

# load the image and pass it through the fused model
img = Image.open('path/to/image.jpg')
with torch.no_grad():
    output = model_fused(img)

# get the predictions
pred = output[-1]

Note that you may need to modify the rest of the forward method to match your specific use case.

Let me know if you have any further questions or concerns.

RANA-ATI commented 1 year ago

Hi @RANA-ATI ,

Sorry about that. It looks like you're trying to iterate over the model1.model object, which is of type DetectMultiBackend and not iterable.

Instead of iterating over model1.model, you can modify the forward method of the detector to concatenate the feature maps and pass the concatenated tensor through the rest of the detector. Here's an example:

import torch
import torch.nn.functional as F
from PIL import Image

# load the two models
model1 = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)
model2 = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)

# extract the last conv2d layer features
features1 = next(reversed(list(filter(lambda x: isinstance(x, torch.nn.Conv2d), model1.modules()))))
features2 = next(reversed(list(filter(lambda x: isinstance(x, torch.nn.Conv2d), model2.modules()))))

# concatenate the features and pass the tensor through the rest of the detector
class MyDetector(torch.nn.Module):
    def __init__(self, backbone):
        super().__init__()
        self.backbone = backbone
        concat_features = torch.cat((features1.weight, features2.weight), dim=1)
        self.conv = torch.nn.Conv2d(concat_features.shape[1], concat_features.shape[0], kernel_size=1)
        self.relu = torch.nn.ReLU(inplace=True)

    def forward(self, x):
        x = self.backbone(x)
        x = self.conv(x[-1]) + self.relu(x[-2])
        x = F.interpolate(x, scale_factor=2)
        x = self.backbone.fpn(x)
        x = self.backbone.forward(x)
        x = self.backbone.tip(x)
        x = self.backbone.forward(x)
        x = self.backbone.head(x)
        return x

model_fused = MyDetector(model1.model)

# load the image and pass it through the fused model
img = Image.open('path/to/image.jpg')
with torch.no_grad():
    output = model_fused(img)

# get the predictions
pred = output[-1]

Note that you may need to modify the rest of the forward method to match your specific use case.

Let me know if you have any further questions or concerns.

Thanks again I used the code to test and putting the image I got this error might be something related to the image I put should it be resized first ?

AttributeError Traceback (most recent call last) Cell In[34], line 38 36 img = Image.open('ra169_1.jpg') 37 with torch.no_grad(): ---> 38 output = model_fused(img) 40 # get the predictions 41 pred = output[-1]

File c:\Users\Administrator.conda\envs\yolov5\lib\site-packages\torch\nn\modules\module.py:1501, in Module._call_impl(self, *args, *kwargs) 1496 # If we don't have any hooks, we want to skip the rest of the logic in 1497 # this function, and just call forward. 1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks 1499 or _global_backward_pre_hooks or _global_backward_hooks 1500 or _global_forward_hooks or _global_forward_pre_hooks): -> 1501 return forward_call(args, **kwargs) 1502 # Do not call functions when jit is used 1503 full_backward_hooks, non_full_backward_hooks = [], []

Cell In[34], line 23, in MyDetector.forward(self, x) 22 def forward(self, x): ---> 23 x = self.backbone(x) 24 x = self.conv(x[-1]) + self.relu(x[-2]) 25 x = F.interpolate(x, scale_factor=2) ... 526 deprecate("Image categories", 10, "is_animated", plural=True) 527 return self._category --> 528 raise AttributeError(name)

AttributeError: shape

glenn-jocher commented 1 year ago

Hi @RANA-ATI,

It looks like the error is related to the input image not having the correct shape. YOLOv5 expects input images to have a height and width that is a multiple of 32. You can try resizing the image to a size that satisfies this requirement before passing it to the model.

Here's an updated code snippet that resizes the image before passing it to the fused model:


import torch
import torch.nn.functional as F
from PIL import Image

# load the two models
model1 = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)
model2 = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)

# extract the last conv2d layer features
features1 = next(reversed(list(filter(lambda x: isinstance(x, torch.nn.Conv2d), model1.modules()))))
features2 = next(reversed(list(filter(lambda x: isinstance(x, torch.nn.Conv2d), model2.modules()))))

# concatenate the features and pass the tensor through the rest of the detector
class MyDetector(torch.nn.Module):
    def __init__(self, backbone):
        super().__init__()
        self.backbone = backbone
        concat_features = torch.cat((features1.weight, features2.weight), dim=1)
        self.conv = torch.nn.Conv2d(concat_features.shape[1], concat_features.shape[0], kernel_size=1)
        self.relu = torch.nn.ReLU(inplace=True)

    def forward(self, x):
        x = self.backbone(x)
        x = self.conv(x[-1]) + self.relu(x[-2])
        x = F.interpolate(x, scale_factor=2)
        x = self.backbone.fpn(x)
        x = self.backbone.forward(x)
        x = self.backbone.tip(x)
        x = self.backbone.forward(x)
        x = self.backbone.head(x)
        return x

model_fused = MyDetector(model1.model)

# load the image and resize it
img = Image.open('path/to/image.jpg')
w, h = img.size
new_w = ((w // 32) + 1) * 32
new_h = ((h // 32) + 1) * 32
img = img.resize((new_w, new_h))

# pass the resized image through

RANA-ATI commented 1 year ago

Hi @RANA-ATI,

It looks like the error is related to the input image not having the correct shape. YOLOv5 expects input images to have a height and width that is a multiple of 32. You can try resizing the image to a size that satisfies this requirement before passing it to the model.

Here's an updated code snippet that resizes the image before passing it to the fused model:

import torch
import torch.nn.functional as F
from PIL import Image

# load the two models
model1 = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)
model2 = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)

# extract the last conv2d layer features
features1 = next(reversed(list(filter(lambda x: isinstance(x, torch.nn.Conv2d), model1.modules()))))
features2 = next(reversed(list(filter(lambda x: isinstance(x, torch.nn.Conv2d), model2.modules()))))

# concatenate the features and pass the tensor through the rest of the detector
class MyDetector(torch.nn.Module):
    def __init__(self, backbone):
        super().__init__()
        self.backbone = backbone
        concat_features = torch.cat((features1.weight, features2.weight), dim=1)
        self.conv = torch.nn.Conv2d(concat_features.shape[1], concat_features.shape[0], kernel_size=1)
        self.relu = torch.nn.ReLU(inplace=True)

    def forward(self, x):
        x = self.backbone(x)
        x = self.conv(x[-1]) + self.relu(x[-2])
        x = F.interpolate(x, scale_factor=2)
        x = self.backbone.fpn(x)
        x = self.backbone.forward(x)
        x = self.backbone.tip(x)
        x = self.backbone.forward(x)
        x = self.backbone.head(x)
        return x

model_fused = MyDetector(model1.model)

# load the image and resize it
img = Image.open('path/to/image.jpg')
w, h = img.size
new_w = ((w // 32) + 1) * 32
new_h = ((h // 32) + 1) * 32
img = img.resize((new_w, new_h))

# pass the resized image through

Thanks, I did this but got the same shape error. Does I have to try different sizes that would be multiple of 32, hit and trial ? or there is some exact shape I can follow ?

img = Image.open('ra169_1.jpg') w, h = img.size new_w = ((w // 32) + 1) 32 new_h = ((h // 32) + 1) 32 img = img.resize((new_w, new_h))

with torch.no_grad(): output = model_fused(img)

pred = output[-1]

glenn-jocher commented 1 year ago

Hi @RANA-ATI,

Hmm, it's strange that you're still getting the same error even after resizing the image to the correct shape. The exact size of the image doesn't matter as long as it's a multiple of 32, so you can try different sizes if you'd like to see if that fixes the issue.

One thing you could try is printing out the shapes of the intermediate tensors in the forward method of your MyDetector class using print(x.shape) statements. This can help you understand where the shape mismatch is occurring and what might be causing it.

Another thing to check is that all of the layers in your MyDetector class have the correct input and output shapes. You might need to adjust the sizes of some of the layers depending on the size of the input image.

Let me know if you have any other questions or concerns!

RANA-ATI commented 1 year ago

Hi @RANA-ATI,

Hmm, it's strange that you're still getting the same error even after resizing the image to the correct shape. The exact size of the image doesn't matter as long as it's a multiple of 32, so you can try different sizes if you'd like to see if that fixes the issue.

One thing you could try is printing out the shapes of the intermediate tensors in the forward method of your MyDetector class using print(x.shape) statements. This can help you understand where the shape mismatch is occurring and what might be causing it.

Another thing to check is that all of the layers in your MyDetector class have the correct input and output shapes. You might need to adjust the sizes of some of the layers depending on the size of the input image.

Let me know if you have any other questions or concerns!

Thanks, well hit and try would be of no use as I have less time as I might have my presentation tomorrow. Is there anything else that can help?

as far as x.shape is concerned,

You meant like this ?

class MyDetector(torch.nn.Module):
    def __init__(self, backbone):
        super().__init__()
        self.backbone = backbone
        concat_features = torch.cat((features1.weight, features2.weight), dim=1)
        self.conv = torch.nn.Conv2d(concat_features.shape[1], concat_features.shape[0], kernel_size=1)
        self.relu = torch.nn.ReLU(inplace=True)

def forward(self, x):
    x = self.backbone(x)
    print(x.shape)
    x = self.conv(x[-1]) + self.relu(x[-2])
    print(x.shape)
    x = F.interpolate(x, scale_factor=2)
    print(x.shape)
    x = self.backbone.fpn(x)
    print(x.shape)
    x = self.backbone.forward(x)
    print(x.shape)
    x = self.backbone.tip(x)
    print(x.shape)
    x = self.backbone.forward(x)
    print(x.shape)
    x = self.backbone.head(x)
    print(x.shape)
    return x

model_fused = MyDetector(model1.model)

img = Image.open('ra169_1.jpg')
w, h = img.size
new_w = ((w // 32) + 1) * 32
new_h = ((h // 32) + 1) * 32
img = img.resize((new_w, new_h))

with torch.no_grad():
    output = model_fused(img)

pred = output[-1]

glenn-jocher commented 1 year ago

Hi @RANA-ATI,

Yes, printing out the shape of each intermediate tensor using print(x.shape) statements as you've shown in your code is a good way to debug your model and find where the shape mismatch is occurring. This will help you identify if any of the layers in your MyDetector class have incorrect input or output shapes.

In addition to that, you can try checking the shape of the final output tensor as well by adding print(pred.shape) after pred = output[-1]. This can give you an idea of what the output tensor should look like and whether it matches your expectations.

If you're still having issues with the model, you might want to try running it on a different image to see if the error is specific to the image you're using. You could also try simplifying the model and gradually adding layers back in to see where the error occurs.

I hope this helps, and please let me know if you have any other questions or concerns!

RANA-ATI commented 1 year ago

Hi @RANA-ATI,

Yes, printing out the shape of each intermediate tensor using print(x.shape) statements as you've shown in your code is a good way to debug your model and find where the shape mismatch is occurring. This will help you identify if any of the layers in your MyDetector class have incorrect input or output shapes.

In addition to that, you can try checking the shape of the final output tensor as well by adding print(pred.shape) after pred = output[-1]. This can give you an idea of what the output tensor should look like and whether it matches your expectations.

If you're still having issues with the model, you might want to try running it on a different image to see if the error is specific to the image you're using. You could also try simplifying the model and gradually adding layers back in to see where the error occurs.

I hope this helps, and please let me know if you have any other questions or concerns!

Thanks I am about to start. One more question for getting features don't we need hooks as you didn't include in previous response. oh I got it please correct me if I am wrong so you before hand extracting feature infact learned weights of pretrained yolo and then concatenate them to use for our image at the end

features1 = next(reversed(list(filter(lambda x: isinstance(x, torch.nn.Conv2d), model1.modules()))))
features2 = next(reversed(list(filter(lambda x: isinstance(x, torch.nn.Conv2d), model2.modules()))))

glenn-jocher commented 1 year ago

Hi @RANA-ATI,

That's correct! In my previous response, I used next(reversed(list(filter(lambda x: isinstance(x, torch.nn.Conv2d), model1.modules())))) and next(reversed(list(filter(lambda x: isinstance(x, torch.nn.Conv2d), model2.modules())))) to extract the weights of the last convolutional layer from each of the YOLOv5 models. These layers contain the final feature maps before the detection heads, which are the features we want to concatenate and use in our fused detector.

The reason we didn't need to use hooks to extract the features is because we only want to extract the weights of these layers, which are already initialized with learned weights from the YOLOv5 training process. We don't need to modify these layers or extract features during inference, so there's no need to use hooks.

I hope this clears up any confusion, and please let me know if you have any other questions!

RANA-ATI commented 1 year ago

Hi @RANA-ATI,

That's correct! In my previous response, I used next(reversed(list(filter(lambda x: isinstance(x, torch.nn.Conv2d), model1.modules())))) and next(reversed(list(filter(lambda x: isinstance(x, torch.nn.Conv2d), model2.modules())))) to extract the weights of the last convolutional layer from each of the YOLOv5 models. These layers contain the final feature maps before the detection heads, which are the features we want to concatenate and use in our fused detector.

The reason we didn't need to use hooks to extract the features is because we only want to extract the weights of these layers, which are already initialized with learned weights from the YOLOv5 training process. We don't need to modify these layers or extract features during inference, so there's no need to use hooks.

I hope this clears up any confusion, and please let me know if you have any other questions!

Thanks got it!. Well I have somehow managed to pass the image and it gone through backbone but now it gave me error at this conv layer now. Isnt there anyway to manage this issue. x = self.conv(x[-1]) + self.relu(x[-2])

features1.weight.shape
OUTPUT:torch.Size([255, 512, 1, 1])
features2.weight.shape
OUTPUT:torch.Size([255, 512, 1, 1])
print('Shape before backbone:', x.shape)
OUTPUT: Shape before backbone: torch.Size([1, 3, 608, 608])
x = self.backbone(x)
print('Shape after backbone:', x.shape)
OUTPUT:  Shape after backbone: torch.Size([1, 22743, 85])

ERROR: RuntimeError Traceback (most recent call last) Cell In[40], line 2 1 with torch.no_grad(): ----> 2 output = model_fused(input_tensor) 4 # get the predictions 5 pred = output[-1]

File c:\Users\Administrator.conda\envs\yolov5\lib\site-packages\torch\nn\modules\module.py:1501, in Module._call_impl(self, *args, *kwargs) 1496 # If we don't have any hooks, we want to skip the rest of the logic in 1497 # this function, and just call forward. 1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks 1499 or _global_backward_pre_hooks or _global_backward_hooks 1500 or _global_forward_hooks or _global_forward_pre_hooks): -> 1501 return forward_call(args, **kwargs) 1502 # Do not call functions when jit is used 1503 full_backward_hooks, non_full_backward_hooks = [], []

Cell In[36], line 13, in MyDetector.forward(self, x) 11 x = self.backbone(x) 12 print('Shape after backbone:', x.shape) ---> 13 x = self.conv(x[-1]) + self.relu(x[-2]) 14 print('Shape after conv and relu:', x.shape) 15 x = F.interpolate(x, scale_factor=2) ... 458 _pair(0), self.dilation, self.groups) --> 459 return F.conv2d(input, weight, bias, self.stride, 460 self.padding, self.dilation, self.groups)

RuntimeError: Expected 3D (unbatched) or 4D (batched) input to conv2d, but got input of size: [22743, 85]

glenn-jocher commented 1 year ago

Hi @RANA-ATI,

I see that you're encountering an error at the convolutional layer in your fused detector model. The error message suggests that the input to the convolutional layer has the wrong number of dimensions, which is causing the RuntimeError.

Based on the output shapes you printed, it looks like the self.conv layer expects a 4D input tensor (i.e. with batch size) but is receiving a 2D tensor [22743, 85] instead. You might need to reshape or add a batch dimension to your output from the self.backbone layer before passing it through self.conv.

For example, you could try reshaping the self.backbone output to [batch_size, num_channel, height, width] using torch.reshape, or adding a batch dimension using torch.unsqueeze.

Let me know if this helps resolve the issue, or if you have any further questions or concerns!

RANA-ATI commented 1 year ago

Hi @RANA-ATI,

I see that you're encountering an error at the convolutional layer in your fused detector model. The error message suggests that the input to the convolutional layer has the wrong number of dimensions, which is causing the RuntimeError.

Based on the output shapes you printed, it looks like the self.conv layer expects a 4D input tensor (i.e. with batch size) but is receiving a 2D tensor [22743, 85] instead. You might need to reshape or add a batch dimension to your output from the self.backbone layer before passing it through self.conv.

For example, you could try reshaping the self.backbone output to [batch_size, num_channel, height, width] using torch.reshape, or adding a batch dimension using torch.unsqueeze.

Let me know if this helps resolve the issue, or if you have any further questions or concerns!

Thanks mate I have little time Now. So, this is what I did before passing it to model_fused Shape before backbone: torch.Size([1, 3, 640, 640]) Shape after backbone: torch.Size([1, 25200, 85])

transform = T.ToTensor()

img = Image.open('ra169.jpg')

w, h = img.size
new_w = ((w // 32) + 1) * 32
new_h = ((h // 32) + 1) * 32
img = img.resize((new_w, new_h))
# Convert the PIL image to a tensor
input_tensor = transform(img)

# Add an extra dimension to simulate batch size of 1
input_tensor = input_tensor.unsqueeze(0)
input_tensor.shape
# [batch_size, channels, height, width]

with torch.no_grad():
    output = model_fused(input_tensor)

but got this error

RuntimeError Traceback (most recent call last) Cell In[212], line 3 1 # [batch_size, channels, height, width] 2 with torch.no_grad(): ----> 3 output = model_fused(input_tensor)

File c:\Users\Administrator.conda\envs\yolov5\lib\site-packages\torch\nn\modules\module.py:1501, in Module._call_impl(self, *args, *kwargs) 1496 # If we don't have any hooks, we want to skip the rest of the logic in 1497 # this function, and just call forward. 1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks 1499 or _global_backward_pre_hooks or _global_backward_hooks 1500 or _global_forward_hooks or _global_forward_pre_hooks): -> 1501 return forward_call(args, **kwargs) 1502 # Do not call functions when jit is used 1503 full_backward_hooks, non_full_backward_hooks = [], []

Cell In[209], line 14, in MyDetector.forward(self, x) 12 x = self.backbone(x) 13 print('Shape after backbone:', x.shape) ---> 14 x = self.conv(x[-1]) + self.relu(x[-2]) 15 print('Shape after conv and relu:', x.shape) 16 x = F.interpolate(x, scale_factor=2)

File c:\Users\Administrator.conda\envs\yolov5\lib\site-packages\torch\nn\modules\module.py:1501, in Module._call_impl(self, *args, **kwargs) ... 458 _pair(0), self.dilation, self.groups) --> 459 return F.conv2d(input, weight, bias, self.stride, 460 self.padding, self.dilation, self.groups)

RuntimeError: Expected 3D (unbatched) or 4D (batched) input to conv2d, but got input of size: [25200, 85]

glenn-jocher commented 1 year ago

Hi @RANA-ATI,

Thanks for sharing this information. Based on the error message, the input shape to the convolutional layer [25200, 85] is still missing a batch dimension. To fix this error, you can try adding a batch dimension to the output from the self.backbone layer before passing it to self.conv using torch.unsqueeze.

Here's an example:

with torch.no_grad():
    backbone_output = model_fused.backbone(input_tensor)
    backbone_output = backbone_output.unsqueeze(0)  # Add a batch dimension
    output = model_fused.conv(backbone_output[-1]) + model_fused.relu(backbone_output[-2])

This should fix the dimensionality issue and allow your model to compute the forward pass correctly.

Please let me know if this helps resolve the issue, or if you have any further questions or concerns!

RANA-ATI commented 1 year ago

with torch.no_grad(): backbone_output = model_fused.backbone(input_tensor) backbone_output = backbone_output.unsqueeze(0) # Add a batch dimension output = model_fused.conv(backbone_output[-1]) + model_fused.relu(backbone_output[-2])

Thanks, Just tried your mentioned method this time got a little different error but still didn't able to proceed. Anxiously waiting for your reply

RuntimeError Traceback (most recent call last) Cell In[237], line 8 6 backbone_output = model_fused.backbone(input_tensor) 7 backbone_output = backbone_output.unsqueeze(0) # Add a batch dimension ----> 8 output = model_fused.conv(backbone_output[-1]) + model_fused.relu(backbone_output[-2])

File c:\Users\Administrator.conda\envs\yolov5\lib\site-packages\torch\nn\modules\module.py:1501, in Module._call_impl(self, *args, *kwargs) 1496 # If we don't have any hooks, we want to skip the rest of the logic in 1497 # this function, and just call forward. 1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks 1499 or _global_backward_pre_hooks or _global_backward_hooks 1500 or _global_forward_hooks or _global_forward_pre_hooks): -> 1501 return forward_call(args, **kwargs) 1502 # Do not call functions when jit is used 1503 full_backward_hooks, non_full_backward_hooks = [], []

File c:\Users\Administrator.conda\envs\yolov5\lib\site-packages\torch\nn\modules\conv.py:463, in Conv2d.forward(self, input) 462 def forward(self, input: Tensor) -> Tensor: --> 463 return self._conv_forward(input, self.weight, self.bias)

File c:\Users\Administrator.conda\envs\yolov5\lib\site-packages\torch\nn\modules\conv.py:459, in Conv2d._conv_forward(self, input, weight, bias) 455 if self.padding_mode != 'zeros': 456 return F.conv2d(F.pad(input, self._reversed_padding_repeated_twice, mode=self.padding_mode), 457 weight, bias, self.stride, 458 _pair(0), self.dilation, self.groups) --> 459 return F.conv2d(input, weight, bias, self.stride, 460 self.padding, self.dilation, self.groups)

RuntimeError: Given groups=1, weight of size [255, 1024, 1, 1], expected input[1, 1, 25200, 85] to have 1024 channels, but got 1 channels instead

glenn-jocher commented 1 year ago

Hi @RANA-ATI,

It looks like the error you're currently encountering is related to the number of input channels expected by the self.conv layer. Based on the error message, the self.conv layer expects an input tensor with 1024 channels, but it's receiving a tensor with only 1 channel instead. This suggests a possible issue with the output from the backbone network.

Here are a few things you can try:

Check the output shape of backbone_output after adding the batch dimension using backbone_output.shape. It should be [1, C, H, W], where C is the number of output channels.
Verify that the shape of the weight tensor in self.conv matches the expected value [255, 1024, 1, 1]. This is the shape of the learnable convolutional filters in the layer.
Ensure that the number of output channels from the backbone network matches the number of input channels required by self.conv. If the mismatch is due to the difference between RGB and grayscale images, you may need to modify the backbone network accordingly.

Please try these suggestions and let me know if they help resolve the issue, or if you have any further questions.

RANA-ATI commented 1 year ago

Hi @RANA-ATI,

It looks like the error you're currently encountering is related to the number of input channels expected by the self.conv layer. Based on the error message, the self.conv layer expects an input tensor with 1024 channels, but it's receiving a tensor with only 1 channel instead. This suggests a possible issue with the output from the backbone network.

Here are a few things you can try:

Check the output shape of backbone_output after adding the batch dimension using backbone_output.shape. It should be [1, C, H, W], where C is the number of output channels.

Verify that the shape of the weight tensor in self.conv matches the expected value [255, 1024, 1, 1]. This is the shape of the learnable convolutional filters in the layer.

Ensure that the number of output channels from the backbone network matches the number of input channels required by self.conv. If the mismatch is due to the difference between RGB and grayscale images, you may need to modify the backbone network accordingly.

Please try these suggestions and let me know if they help resolve the issue, or if you have any further questions.

Thanks for the reply. Well I printed the shape of backbone_output which results in torch.Size([1, 1, 25200, 85]) which obviously mean that it gave grayscale results as channel is 1. So what can I do now? Is there issue with backbone then ?

img = Image.open('images/ra169_1.jpg')

w, h = img.size
new_w = ((w // 32) + 1) * 32
new_h = ((h // 32) + 1) * 32
img = img.resize((new_w, new_h))

transform = T.ToTensor()

# Convert the PIL image to a tensor
nput_tensor = transform(img)

# Add an extra dimension to simulate batch size of 1
input_tensor = input_tensor.unsqueeze(0)
input_tensor.shape
# # [batch_size, channels, height, width]
# with torch.no_grad():
#     output = model_fused(input_tensor)

with torch.no_grad():
    backbone_output = model_fused.backbone(input_tensor)
    backbone_output = backbone_output.unsqueeze(0)  # Add a batch dimension
    print(backbone_output.shape)
    output = model_fused.conv(backbone_output[-1]) + model_fused.relu(backbone_output[-2])

glenn-jocher commented 1 year ago

@RANA-ATI hi,

Thanks for providing the additional information. It looks like the issue may be related to the backbone network not producing the expected output. The shape of backbone_output that you printed confirms that the network is producing grayscale images instead of RGB images with three channels.

To resolve this issue, you may need to modify the backbone network architecture to produce RGB images instead of grayscale images. Depending on the exact architecture of your backbone network, this could involve adjusting the number of input channels, changing the activation function, or modifying the weights of the convolutional filters.

I hope this helps, and please let me know if you have any further questions or concerns.

RANA-ATI commented 1 year ago

@RANA-ATI hi,

Thanks for providing the additional information. It looks like the issue may be related to the backbone network not producing the expected output. The shape of backbone_output that you printed confirms that the network is producing grayscale images instead of RGB images with three channels.

To resolve this issue, you may need to modify the backbone network architecture to produce RGB images instead of grayscale images. Depending on the exact architecture of your backbone network, this could involve adjusting the number of input channels, changing the activation function, or modifying the weights of the convolutional filters.

I hope this helps, and please let me know if you have any further questions or concerns.

Iam not sure how can I change or what things can I do to make the backbone. Can you please let me know or give any reference too ?

glenn-jocher commented 1 year ago

@RANA-ATI,

I understand that modifying the backbone network can be challenging, especially if you're not familiar with its underlying architecture. Without further details about the network, it's difficult for me to provide specific advice on how to modify the backbone network to produce RGB images.

However, some possible approaches could be adjusting the number of input channels, changing the activation function or modifying the weights of the convolutional filters. It's also possible that using a different backbone network architecture could solve the issue entirely.

I would suggest checking the documentation and research resources available for your specific backbone network and exploring existing code repositories or forums for related issues. This might give you helpful guidance on how to modify the architecture for our input.

I hope this helps, and please let me know if you have any further questions or concerns.

RANA-ATI commented 1 year ago

@RANA-ATI,

I understand that modifying the backbone network can be challenging, especially if you're not familiar with its underlying architecture. Without further details about the network, it's difficult for me to provide specific advice on how to modify the backbone network to produce RGB images.

However, some possible approaches could be adjusting the number of input channels, changing the activation function or modifying the weights of the convolutional filters. It's also possible that using a different backbone network architecture could solve the issue entirely.

I would suggest checking the documentation and research resources available for your specific backbone network and exploring existing code repositories or forums for related issues. This might give you helpful guidance on how to modify the architecture for our input.

I hope this helps, and please let me know if you have any further questions or concerns.

Well this seems like time taking thing. Isn't there anything else? like if there's any other layer we can fuse rather than this one? which wont give us the issue ? Please if you can let me know that I don't have issue changing layer either unless its not the input layer. I just want to complete this task and I know I have been kept commenting but I have this issue to be resolved asap.

glenn-jocher commented 1 year ago

@RANA-ATI,

I understand that modifying the backbone network can be a time-consuming process, and it's important to find a solution to your issue as soon as possible.

One possible approach to solve the issue with the learnable fused convolutions in the YOLOv5 model could be to try fusing a different layer. However, please note that changing the layer of the model may require additional modifications to the network architecture and other parts of the code.

You could explore other model architectures as well. The PyTorch Hub offers several object detection models that might suit your requirements.

If you have additional questions or concerns, please feel free to ask.

RANA-ATI commented 1 year ago

@RANA-ATI,

I understand that modifying the backbone network can be a time-consuming process, and it's important to find a solution to your issue as soon as possible.

One possible approach to solve the issue with the learnable fused convolutions in the YOLOv5 model could be to try fusing a different layer. However, please note that changing the layer of the model may require additional modifications to the network architecture and other parts of the code.

You could explore other model architectures as well. The PyTorch Hub offers several object detection models that might suit your requirements.

If you have additional questions or concerns, please feel free to ask.

well I am not sure where to start from and I also want to use yolov5. So how can I change the structure is there any specific file ? Also if you want to know more about the problem I can explain that as well

glenn-jocher commented 1 year ago

Hi @RANA-ATI,

I understand that modifying the backbone network structure to resolve the issue with the learnable fused convolutions in YOLOv5 might seem like a daunting task.

To modify the backbone network architecture, you would need to access the source code of the model architecture and make changes to the architecture. However, making such changes requires a good understanding of the model's architecture and PyTorch framework.

If you would like to learn more about the necessary modifications or troubleshooting this issue, I suggest exploring research resources or looking for similar issues on related forums or code repositories. You could also try fusing a different layer in the YOLOv5 model to see if it resolves your issue.

In case you want to explore other model architectures, the PyTorch Hub offers alternatives to YOLOv5 that might suit your requirements. Feel free to let me know if you have additional questions or concerns.

Thanks.

choksiri commented 10 months ago

@lzy-aมีโมเดล P2 สำหรับการตรวจจับวัตถุขนาดเล็กอยู่ที่นี่ ไม่จำเป็นต้องแก้ไขใด ๆ เพื่อใช้งาน: https://github.com/ultralytics/yolov5/blob/master/models/hub/yolov5-p2.yaml

ขอบคุณมาก.

ดูเหมือนว่าความแตกต่างเพียงอย่างเดียวระหว่างฉันคือฉันได้เพิ่มโมดูล SE สองโมดูล

ผลของการเพิ่มโมดูล SE 2 ตัวเพียงอย่างเดียวได้รับการปรับปรุง แต่ผลของการเพิ่มเลเยอร์การตรวจจับใหม่นั้นจะแย่ลง

ฉันต้องคิดเกี่ยวกับมัน

How to add SE module in Yolov5? I've been searching for a way to do this for 2 days. TTTT

glenn-jocher commented 10 months ago

Hi @choksiri,

You can add a Squeeze-and-Excitation (SE) module in YOLOv5 using the yolov5-p2.yaml file. Here's a reference to the YOLOv5 model configuration file for the P2 model that includes the SE module: yolov5-p2.yaml.

I hope this helps! Let me know if you have any further questions.

ultralytics / yolov5