ultralytics / yolov5

YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
50.79k stars 16.36k forks source link

weight channels error while training #7436

Closed LeoNull101 closed 2 years ago

LeoNull101 commented 2 years ago

Search before asking

Question

Bug Traceback (most recent call last): File "train.py", line 667, in main(opt) File "train.py", line 562, in main train(opt.hyp, opt, device, callbacks) File "train.py", line 128, in train model = Model(cfg, ch=3, nc=nc, anchors=hyp.get('anchors')).to(device) # create File "/content/yolov5/models/yolo.py", line 121, in init m.stride = torch.tensor([s / x.shape[-2] for x in self.forward(torch.zeros(1, ch, s, s))]) # forward File "/content/yolov5/models/yolo.py", line 135, in forward return self._forward_once(x, profile, visualize) # single-scale inference, train File "/content/yolov5/models/yolo.py", line 158, in _forward_once x = m(x) # run File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, kwargs) File "/content/yolov5/models/common.py", line 1005, in forward softmax_att=self.attention(x) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, *kwargs) File "/content/yolov5/models/common.py", line 982, in forward att=self.net(att) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(input, kwargs) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/conv.py", line 446, in forward return self._conv_forward(input, self.weight, self.bias) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/conv.py", line 443, in _conv_forward self.padding, self.dilation, self.groups) RuntimeError: Given groups=1, weight of size [4, 128, 1, 1], expected input[1, 16, 1, 1] to have 128 channels, but got 16 channels instead

Minimal Reproducible Example

clone YOLOv5 and

!git clone https://github.com/ultralytics/yolov5 # clone repo %cd yolov5 %pip install -qr requirements.txt # install dependencies %pip install -q roboflow

import torch import os from IPython.display import Image, clear_output # to display images

print(f"Setup complete. Using torch {torch.version} ({torch.cuda.get_device_properties(0).name if torch.cuda.is_available() else 'CPU'})")

!pip install roboflow

from roboflow import Roboflow rf = Roboflow(api_key="sr9Ld9Zy7jkFgHotOzwM") project = rf.workspace("nulllu").project("ship-phwt6") dataset = project.version(5).download("yolov5")#增强后数据集.

set up environment os.environ["DATASET_DIRECTORY"] = "/content/datasets"

set up environment os.environ["DATASET_DIRECTORY"] = "/content/datasets"

!python train.py --img 416 --batch 16 --epochs 60 --data {dataset.location}/data.yaml --weights '' --cfg /content/yolov5/models/yolov5s.yaml - --cache#new yolov5s.yaml

Additional i make a new yaml for train as follow, and my dataset is about ship detection, and got only 1 class

YOLOv5 🚀 by Ultralytics, GPL-3.0 license

Parameters

nc: 80 # number of classes depth_multiple: 0.33 # model depth multiple width_multiple: 0.25 # layer channel multiple anchors:

YOLOv5 v6.0 backbone

backbone:

[from, number, module, args]

[[-1, 1, Conv, [64, 6, 2]], # 0-P1/2 [-1, 1, CondConv, [128, 3, 2, 1]], # 1-P2/4 [-1, 3, C3, [128]], [-1, 1, CondConv, [256, 3, 2, 1]], # 3-P3/8 [-1, 6, C3, [256]], [-1, 1, CondConv, [512, 3, 2, 1]], # 5-P4/16 [-1, 9, C3, [512]], [-1, 1, CondConv, [1024, 3, 2, 1]], # 7-P5/32 [-1, 3, C3, [1024]], [-1, 2, SPPF, [1024, 5]], # 9 ]

YOLOv5 v6.0 head

head: [[-1, 1, CondConv, [512, 1, 1, 1]], [-1, 1, nn.Upsample, [None, 2, 'nearest']], [[-1, 6], 1, Concat, [1]], # cat backbone P4 [-1, 3, C3, [512, False]], # 13

[-1, 1, CondConv, [256, 1, 1, 1]], [-1, 1, nn.Upsample, [None, 2, 'nearest']], [[-1, 4], 1, Concat, [1]], # cat backbone P3 [-1, 3, C3, [256, False]], # 17 (P3/8-small)

[-1, 1, CondConv, [256, 3, 2, 1]], [[-1, 14], 1, Concat, [1]], # cat head P4 [-1, 3, C3, [512, False]], # 20 (P4/16-medium)

[-1, 1, CondConv, [512, 3, 2, 1]], [[-1, 10], 1, Concat, [1]], # cat head P5 [-1, 3, C3, [1024, False]], # 23 (P5/32-large)

[[17, 20, 23], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5) ] ----------------------------------------------------------------------also the common.py

YOLOv5 🚀 by Ultralytics, GPL-3.0 license

""" Common modules """

import torch.nn.functional as F

import json import math import platform import warnings from collections import OrderedDict, namedtuple from copy import copy from pathlib import Path

from detectron2 import model_zoo

import cv2 import numpy as np import pandas as pd import requests import torch import torch.nn as nn import yaml from PIL import Image from torch.cuda import amp import torch.utils.model_zoo as model_zoo from utils.datasets import exif_transpose, letterbox from utils.general import (LOGGER, check_requirements, check_suffix, check_version, colorstr, increment_path, make_divisible, non_max_suppression, scale_coords, xywh2xyxy, xyxy2xywh) from utils.plots import Annotator, colors, save_one_box from utils.torch_utils import copy_attr, time_sync model_urls = { 'scnet50': 'https://backseason.oss-cn-beijing.aliyuncs.com/scnet/scnet50-dc6a7e87.pth', 'scnet50_v1d': 'https://backseason.oss-cn-beijing.aliyuncs.com/scnet/scnet50_v1d-4109d1e1.pth', 'scnet101': 'https://backseason.oss-cn-beijing.aliyuncs.com/scnet/scnet101-44c5b751.pth',

'scnet101_v1d': coming soon...

}

def autopad(k, p=None): # kernel, padding

Pad to 'same'

if p is None:
    p = k // 2 if isinstance(k, int) else (x // 2 for x in k)  # auto-pad
return p

class Conv(nn.Module):

Standard convolution

def __init__(self, c1, c2, k=1, s=1, p=None, g=1, act=True):  # ch_in, ch_out, kernel, stride, padding, groups
    super().__init__()
    self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p), groups=g, bias=False)
    self.bn = nn.BatchNorm2d(c2)
    self.act = nn.SiLU() if act is True else (act if isinstance(act, nn.Module) else nn.Identity())

def forward(self, x):
    return self.act(self.bn(self.conv(x)))

def forward_fuse(self, x):
    return self.act(self.conv(x))

class DWConv(Conv):

Depth-wise convolution class

def __init__(self, c1, c2, k=1, s=1, act=True):  # ch_in, ch_out, kernel, stride, padding, groups
    super().__init__(c1, c2, k, s, g=math.gcd(c1, c2), act=act)

class TransformerLayer(nn.Module):

Transformer layer https://arxiv.org/abs/2010.11929 (LayerNorm layers removed for better performance)

def __init__(self, c, num_heads):
    super().__init__()
    self.q = nn.Linear(c, c, bias=False)
    self.k = nn.Linear(c, c, bias=False)
    self.v = nn.Linear(c, c, bias=False)
    self.ma = nn.MultiheadAttention(embed_dim=c, num_heads=num_heads)
    self.fc1 = nn.Linear(c, c, bias=False)
    self.fc2 = nn.Linear(c, c, bias=False)

def forward(self, x):
    x = self.ma(self.q(x), self.k(x), self.v(x))[0] + x
    x = self.fc2(self.fc1(x)) + x
    return x

class TransformerBlock(nn.Module):

Vision Transformer https://arxiv.org/abs/2010.11929

def __init__(self, c1, c2, num_heads, num_layers):
    super().__init__()
    self.conv = None
    if c1 != c2:
        self.conv = Conv(c1, c2)
    self.linear = nn.Linear(c2, c2)  # learnable position embedding
    self.tr = nn.Sequential(*(TransformerLayer(c2, num_heads) for _ in range(num_layers)))
    self.c2 = c2

def forward(self, x):
    if self.conv is not None:
        x = self.conv(x)
    b, _, w, h = x.shape
    p = x.flatten(2).permute(2, 0, 1)
    return self.tr(p + self.linear(p)).permute(1, 2, 0).reshape(b, self.c2, w, h)

class Bottleneck(nn.Module):

Standard bottleneck

def __init__(self, c1, c2, shortcut=True, g=1, e=0.5):  # ch_in, ch_out, shortcut, groups, expansion
    super().__init__()
    c_ = int(c2 * e)  # hidden channels
    self.cv1 = CondConv(c1, c_, 1, 1, 1)
    self.cv2 = CondConv(c_, c2, 3, 1, 1, groups=g)
    self.add = shortcut and c1 == c2

def forward(self, x):
    return x + self.cv2(self.cv1(x)) if self.add else self.cv2(self.cv1(x))

class SCNet(nn.Module): """ SCNet Variants Definations Parameters

block : Block
    Class for the residual block.
layers : list of int
    Numbers of layers in each block.
classes : int, default 1000
    Number of classification classes.
dilated : bool, default False
    Applying dilation strategy to pretrained SCNet yielding a stride-8 model.
deep_stem : bool, default False
    Replace 7x7 conv in input stem with 3 3x3 conv.
avg_down : bool, default False
    Use AvgPool instead of stride conv when
    downsampling in the bottleneck.
norm_layer : object
    Normalization layer used (default: :class:`torch.nn.BatchNorm2d`).
Reference:
    - He, Kaiming, et al. "Deep residual learning for image recognition."
    Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
    - Yu, Fisher, and Vladlen Koltun. "Multi-scale context aggregation by dilated convolutions."
"""
def __init__(self, block, layers, groups=1, bottleneck_width=32,
             num_classes=1000, dilated=False, dilation=1,
             deep_stem=False, stem_width=64, avg_down=False,
             avd=False, norm_layer=nn.BatchNorm2d):
    self.cardinality = groups
    self.bottleneck_width = bottleneck_width
    # ResNet-D params
    self.inplanes = stem_width*2 if deep_stem else 64
    self.avg_down = avg_down
    self.avd = avd

    super(SCNet, self).__init__()
    conv_layer = nn.Conv2d
    if deep_stem:
        self.conv1 = nn.Sequential(
            conv_layer(3, stem_width, kernel_size=3, stride=2, padding=1, bias=False),
            norm_layer(stem_width),
            nn.ReLU(inplace=True),
            conv_layer(stem_width, stem_width, kernel_size=3, stride=1, padding=1, bias=False),
            norm_layer(stem_width),
            nn.ReLU(inplace=True),
            conv_layer(stem_width, stem_width*2, kernel_size=3, stride=1, padding=1, bias=False),
        )
    else:
        self.conv1 = conv_layer(3, 64, kernel_size=7, stride=2, padding=3,
                               bias=False)
    self.bn1 = norm_layer(self.inplanes)
    self.relu = nn.ReLU(inplace=True)
    self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
    self.layer1 = self._make_layer(block, 64, layers[0], norm_layer=norm_layer, is_first=False)
    self.layer2 = self._make_layer(block, 128, layers[1], stride=2, norm_layer=norm_layer)
    if dilated or dilation == 4:
        self.layer3 = self._make_layer(block, 256, layers[2], stride=1,
                                       dilation=2, norm_layer=norm_layer)
        self.layer4 = self._make_layer(block, 512, layers[3], stride=1,
                                       dilation=4, norm_layer=norm_layer)
    elif dilation==2:
        self.layer3 = self._make_layer(block, 256, layers[2], stride=2,
                                       dilation=1, norm_layer=norm_layer)
        self.layer4 = self._make_layer(block, 512, layers[3], stride=1,
                                       dilation=2, norm_layer=norm_layer)
    else:
        self.layer3 = self._make_layer(block, 256, layers[2], stride=2,
                                       norm_layer=norm_layer)
        self.layer4 = self._make_layer(block, 512, layers[3], stride=2,
                                       norm_layer=norm_layer)
    self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
    self.fc = nn.Linear(512 * block.expansion, num_classes)

    for m in self.modules():
        if isinstance(m, nn.Conv2d):
            nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
        elif isinstance(m, norm_layer):
            nn.init.constant_(m.weight, 1)
            nn.init.constant_(m.bias, 0)

def _make_layer(self, block, planes, blocks, stride=1, dilation=1, norm_layer=None,
                is_first=True):
    downsample = None
    if stride != 1 or self.inplanes != planes * block.expansion:
        down_layers = []
        if self.avg_down:
            if dilation == 1:
                down_layers.append(nn.AvgPool2d(kernel_size=stride, stride=stride,
                                                ceil_mode=True, count_include_pad=False))
            else:
                down_layers.append(nn.AvgPool2d(kernel_size=1, stride=1,
                                                ceil_mode=True, count_include_pad=False))
            down_layers.append(nn.Conv2d(self.inplanes, planes * block.expansion,
                                         kernel_size=1, stride=1, bias=False))
        else:
            down_layers.append(nn.Conv2d(self.inplanes, planes * block.expansion,
                                         kernel_size=1, stride=stride, bias=False))
        down_layers.append(norm_layer(planes * block.expansion))
        downsample = nn.Sequential(*down_layers)

    layers = []
    if dilation == 1 or dilation == 2:
        layers.append(block(self.inplanes, planes, stride, downsample=downsample,
                            cardinality=self.cardinality,
                            bottleneck_width=self.bottleneck_width,
                            avd=self.avd, dilation=1, is_first=is_first, 
                            norm_layer=norm_layer))
    elif dilation == 4:
        layers.append(block(self.inplanes, planes, stride, downsample=downsample,
                            cardinality=self.cardinality,
                            bottleneck_width=self.bottleneck_width,
                            avd=self.avd, dilation=2, is_first=is_first, 
                            norm_layer=norm_layer))
    else:
        raise RuntimeError("=> unknown dilation size: {}".format(dilation))

    self.inplanes = planes * block.expansion
    for i in range(1, blocks):
        layers.append(block(self.inplanes, planes,
                            cardinality=self.cardinality,
                            bottleneck_width=self.bottleneck_width,
                            avd=self.avd, dilation=dilation, 
                            norm_layer=norm_layer))

    return nn.Sequential(*layers)

def forward(self, x):
    x = self.conv1(x)
    x = self.bn1(x)
    x = self.relu(x)
    x = self.maxpool(x)

    x = self.layer1(x)
    x = self.layer2(x)
    x = self.layer3(x)
    x = self.layer4(x)

    x = self.avgpool(x)
    x = x.view(x.size(0), -1)
    x = self.fc(x)

    return x

def scnet50(pretrained=False, kwargs): """Constructs a SCNet-50 model. Args: pretrained (bool): If True, returns a model pre-trained on ImageNet """ model = SCNet(SCBottleneck, [3, 4, 6, 3], deep_stem=False, stem_width=32, avg_down=False, avd=False, kwargs) if pretrained: model.load_state_dict(model_zoo.load_url(model_urls['scnet50'])) return model

class BottleneckCSP(nn.Module):

CSP Bottleneck https://github.com/WongKinYiu/CrossStagePartialNetworks

def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5):  # ch_in, ch_out, number, shortcut, groups, expansion
    super().__init__()
    c_ = int(c2 * e)  # hidden channels
    self.cv1 = Conv(c1, c_, 1, 1)
    self.cv2 = nn.Conv2d(c1, c_, 1, 1, bias=False)
    self.cv3 = nn.Conv2d(c_, c_, 1, 1, bias=False)
    self.cv4 = Conv(2 * c_, c2, 1, 1)
    self.bn = nn.BatchNorm2d(2 * c_)  # applied to cat(cv2, cv3)
    self.act = nn.SiLU()
    self.m = nn.Sequential(*(Bottleneck(c_, c_, shortcut, g, e=1.0) for _ in range(n)))

def forward(self, x):
    y1 = self.cv3(self.m(self.cv1(x)))
    y2 = self.cv2(x)
    return self.cv4(self.act(self.bn(torch.cat((y1, y2), 1))))

class SCBottleneck(nn.Module): """SCNet SCBottleneck """ expansion = 4 pooling_r = 4 # down-sampling rate of the avg pooling layer in the K3 path of SC-Conv.

def __init__(self, inplanes, planes, stride=1, downsample=None,
             cardinality=1, bottleneck_width=32,
             avd=False, dilation=1, is_first=False,
             norm_layer=None):
    super(SCBottleneck, self).__init__()
    group_width = int(planes * (bottleneck_width / 64.)) * cardinality
    self.conv1_a = nn.Conv2d(inplanes, group_width, kernel_size=1, bias=False)
    self.bn1_a = norm_layer(group_width)
    self.conv1_b = nn.Conv2d(inplanes, group_width, kernel_size=1, bias=False)
    self.bn1_b = norm_layer(group_width)
    self.avd = avd and (stride > 1 or is_first)

    if self.avd:
        self.avd_layer = nn.AvgPool2d(3, stride, padding=1)
        stride = 1

    self.k1 = nn.Sequential(
                nn.Conv2d(
                    group_width, group_width, kernel_size=3, stride=stride,
                    padding=dilation, dilation=dilation,
                    groups=cardinality, bias=False),
                norm_layer(group_width),
                )

    self.scconv = SCConv(
        group_width, group_width, stride=stride,
        padding=dilation, dilation=dilation,
        groups=cardinality, pooling_r=self.pooling_r, norm_layer=norm_layer)

    self.conv3 = nn.Conv2d(
        group_width * 2, planes * 4, kernel_size=1, bias=False)
    self.bn3 = norm_layer(planes*4)

    self.relu = nn.ReLU(inplace=True)
    self.downsample = downsample
    self.dilation = dilation
    self.stride = stride

def forward(self, x):
    residual = x

    out_a= self.conv1_a(x)
    out_a = self.bn1_a(out_a)
    out_b = self.conv1_b(x)
    out_b = self.bn1_b(out_b)
    out_a = self.relu(out_a)
    out_b = self.relu(out_b)

    out_a = self.k1(out_a)
    out_b = self.scconv(out_b)
    out_a = self.relu(out_a)
    out_b = self.relu(out_b)

    if self.avd:
        out_a = self.avd_layer(out_a)
        out_b = self.avd_layer(out_b)

    out = self.conv3(torch.cat([out_a, out_b], dim=1))
    out = self.bn3(out)

    if self.downsample is not None:
        residual = self.downsample(x)

    out += residual
    out = self.relu(out)

    return out

class C3(nn.Module):

CSP Bottleneck with 3 convolutions

def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5):  # ch_in, ch_out, number, shortcut, groups, expansion
    super().__init__()
    c_ = int(c2 * e)  # hidden channels
    self.cv1 = CondConv(c1, c_, 1, 1, 1)
    self.cv2 = CondConv(c1, c_, 1, 1, 1)
    self.cv3 = CondConv(2 * c_, c2, 1, 1)  # optional act=FReLU(c2)
    self.m = nn.Sequential(*(Bottleneck(c_, c_, shortcut, g, e=1.0) for _ in range(n)))
    # self.m = nn.Sequential(*(CrossConv(c_, c_, 3, 1, g, 1.0, shortcut) for _ in range(n)))

def forward(self, x):
    return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), 1))

class C3TR(C3):

C3 module with TransformerBlock()

def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5):
    super().__init__(c1, c2, n, shortcut, g, e)
    c_ = int(c2 * e)
    self.m = TransformerBlock(c_, c_, 4, n)

class C3SPP(C3):

C3 module with SPP()

def __init__(self, c1, c2, k=(5, 9, 13), n=1, shortcut=True, g=1, e=0.5):
    super().__init__(c1, c2, n, shortcut, g, e)
    c_ = int(c2 * e)
    self.m = SPP(c_, c_, k)

class C3Ghost(C3):

C3 module with GhostBottleneck()

def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5):
    super().__init__(c1, c2, n, shortcut, g, e)
    c_ = int(c2 * e)  # hidden channels
    self.m = nn.Sequential(*(GhostBottleneck(c_, c_) for _ in range(n)))

class SPP(nn.Module):

Spatial Pyramid Pooling (SPP) layer https://arxiv.org/abs/1406.4729

def __init__(self, c1, c2, k=(5, 9, 13)):
    super().__init__()
    c_ = c1 // 2  # hidden channels
    self.cv1 = Conv(c1, c_, 1, 1)
    self.cv2 = Conv(c_ * (len(k) + 1), c2, 1, 1)
    self.m = nn.ModuleList([nn.MaxPool2d(kernel_size=x, stride=1, padding=x // 2) for x in k])

def forward(self, x):
    x = self.cv1(x)
    with warnings.catch_warnings():
        warnings.simplefilter('ignore')  # suppress torch 1.9.0 max_pool2d() warning
        return self.cv2(torch.cat([x] + [m(x) for m in self.m], 1))

class SPPF(nn.Module):

Spatial Pyramid Pooling - Fast (SPPF) layer for YOLOv5 by Glenn Jocher

def __init__(self, c1, c2, k=5):  # equivalent to SPP(k=(5, 9, 13))
    super().__init__()
    c_ = c1 // 2  # hidden channels
    self.cv1 = Conv(c1, c_, 1, 1)
    self.cv2 = Conv(c_ * 4, c2, 1, 1)
    self.m = nn.MaxPool2d(kernel_size=k, stride=1, padding=k // 2)

def forward(self, x):
    x = self.cv1(x)
    with warnings.catch_warnings():
        warnings.simplefilter('ignore')  # suppress torch 1.9.0 max_pool2d() warning
        y1 = self.m(x)
        y2 = self.m(y1)
        return self.cv2(torch.cat((x, y1, y2, self.m(y2)), 1))

class Focus(nn.Module):

Focus wh information into c-space

def __init__(self, c1, c2, k=1, s=1, p=None, g=1, act=True):  # ch_in, ch_out, kernel, stride, padding, groups
    super().__init__()
    self.conv = Conv(c1 * 4, c2, k, s, p, g, act)
    # self.contract = Contract(gain=2)

def forward(self, x):  # x(b,c,w,h) -> y(b,4c,w/2,h/2)
    return self.conv(torch.cat((x[..., ::2, ::2], x[..., 1::2, ::2], x[..., ::2, 1::2], x[..., 1::2, 1::2]), 1))
    # return self.conv(self.contract(x))

class GhostConv(nn.Module):

Ghost Convolution https://github.com/huawei-noah/ghostnet

def __init__(self, c1, c2, k=1, s=1, g=1, act=True):  # ch_in, ch_out, kernel, stride, groups
    super().__init__()
    c_ = c2 // 2  # hidden channels
    self.cv1 = Conv(c1, c_, k, s, None, g, act)
    self.cv2 = Conv(c_, c_, 5, 1, None, c_, act)

def forward(self, x):
    y = self.cv1(x)
    return torch.cat((y, self.cv2(y)), 1)

class GhostBottleneck(nn.Module):

Ghost Bottleneck https://github.com/huawei-noah/ghostnet

def __init__(self, c1, c2, k=3, s=1):  # ch_in, ch_out, kernel, stride
    super().__init__()
    c_ = c2 // 2
    self.conv = nn.Sequential(
        GhostConv(c1, c_, 1, 1),  # pw
        DWConv(c_, c_, k, s, act=False) if s == 2 else nn.Identity(),  # dw
        GhostConv(c_, c2, 1, 1, act=False))  # pw-linear
    self.shortcut = nn.Sequential(DWConv(c1, c1, k, s, act=False), Conv(c1, c2, 1, 1,
                                                                        act=False)) if s == 2 else nn.Identity()

def forward(self, x):
    return self.conv(x) + self.shortcut(x)

class Contract(nn.Module):

Contract width-height into channels, i.e. x(1,64,80,80) to x(1,256,40,40)

def __init__(self, gain=2):
    super().__init__()
    self.gain = gain

def forward(self, x):
    b, c, h, w = x.size()  # assert (h / s == 0) and (W / s == 0), 'Indivisible gain'
    s = self.gain
    x = x.view(b, c, h // s, s, w // s, s)  # x(1,64,40,2,40,2)
    x = x.permute(0, 3, 5, 1, 2, 4).contiguous()  # x(1,2,2,64,40,40)
    return x.view(b, c * s * s, h // s, w // s)  # x(1,256,40,40)

class Expand(nn.Module):

Expand channels into width-height, i.e. x(1,64,80,80) to x(1,16,160,160)

def __init__(self, gain=2):
    super().__init__()
    self.gain = gain

def forward(self, x):
    b, c, h, w = x.size()  # assert C / s ** 2 == 0, 'Indivisible gain'
    s = self.gain
    x = x.view(b, s, s, c // s ** 2, h, w)  # x(1,2,2,16,80,80)
    x = x.permute(0, 3, 4, 1, 5, 2).contiguous()  # x(1,16,80,2,80,2)
    return x.view(b, c // s ** 2, h * s, w * s)  # x(1,16,160,160)

class Concat(nn.Module):

Concatenate a list of tensors along dimension

def __init__(self, dimension=1):
    super().__init__()
    self.d = dimension

def forward(self, x):
    return torch.cat(x, self.d)

class DetectMultiBackend(nn.Module):

YOLOv5 MultiBackend class for python inference on various backends

def __init__(self, weights='yolov5s.pt', device=torch.device('cpu'), dnn=False, data=None, fp16=False):
    # Usage:
    #   PyTorch:              weights = *.pt
    #   TorchScript:                    *.torchscript
    #   ONNX Runtime:                   *.onnx
    #   ONNX OpenCV DNN:                *.onnx with --dnn
    #   OpenVINO:                       *.xml
    #   CoreML:                         *.mlmodel
    #   TensorRT:                       *.engine
    #   TensorFlow SavedModel:          *_saved_model
    #   TensorFlow GraphDef:            *.pb
    #   TensorFlow Lite:                *.tflite
    #   TensorFlow Edge TPU:            *_edgetpu.tflite
    from models.experimental import attempt_download, attempt_load  # scoped to avoid circular import

    super().__init__()
    w = str(weights[0] if isinstance(weights, list) else weights)
    pt, jit, onnx, xml, engine, coreml, saved_model, pb, tflite, edgetpu, tfjs = self.model_type(w)  # get backend
    stride, names = 32, [f'class{i}' for i in range(1000)]  # assign defaults
    w = attempt_download(w)  # download if not local
    fp16 &= (pt or jit or onnx or engine) and device.type != 'cpu'  # FP16
    if data:  # data.yaml path (optional)
        with open(data, errors='ignore') as f:
            names = yaml.safe_load(f)['names']  # class names

    if pt:  # PyTorch
        model = attempt_load(weights if isinstance(weights, list) else w, map_location=device)
        stride = max(int(model.stride.max()), 32)  # model stride
        names = model.module.names if hasattr(model, 'module') else model.names  # get class names
        model.half() if fp16 else model.float()
        self.model = model  # explicitly assign for to(), cpu(), cuda(), half()
    elif jit:  # TorchScript
        LOGGER.info(f'Loading {w} for TorchScript inference...')
        extra_files = {'config.txt': ''}  # model metadata
        model = torch.jit.load(w, _extra_files=extra_files)
        model.half() if fp16 else model.float()
        if extra_files['config.txt']:
            d = json.loads(extra_files['config.txt'])  # extra_files dict
            stride, names = int(d['stride']), d['names']
    elif dnn:  # ONNX OpenCV DNN
        LOGGER.info(f'Loading {w} for ONNX OpenCV DNN inference...')
        check_requirements(('opencv-python>=4.5.4',))
        net = cv2.dnn.readNetFromONNX(w)
    elif onnx:  # ONNX Runtime
        LOGGER.info(f'Loading {w} for ONNX Runtime inference...')
        cuda = torch.cuda.is_available()
        check_requirements(('onnx', 'onnxruntime-gpu' if cuda else 'onnxruntime'))
        import onnxruntime
        providers = ['CUDAExecutionProvider', 'CPUExecutionProvider'] if cuda else ['CPUExecutionProvider']
        session = onnxruntime.InferenceSession(w, providers=providers)
        meta = session.get_modelmeta().custom_metadata_map  # metadata
        if 'stride' in meta:
            stride, names = int(meta['stride']), eval(meta['names'])
    elif xml:  # OpenVINO
        LOGGER.info(f'Loading {w} for OpenVINO inference...')
        check_requirements(('openvino-dev',))  # requires openvino-dev: https://pypi.org/project/openvino-dev/
        import openvino.inference_engine as ie
        core = ie.IECore()
        if not Path(w).is_file():  # if not *.xml
            w = next(Path(w).glob('*.xml'))  # get *.xml file from *_openvino_model dir
        network = core.read_network(model=w, weights=Path(w).with_suffix('.bin'))  # *.xml, *.bin paths
        executable_network = core.load_network(network, device_name='CPU', num_requests=1)
    elif engine:  # TensorRT
        LOGGER.info(f'Loading {w} for TensorRT inference...')
        import tensorrt as trt  # https://developer.nvidia.com/nvidia-tensorrt-download
        check_version(trt.__version__, '7.0.0', hard=True)  # require tensorrt>=7.0.0
        Binding = namedtuple('Binding', ('name', 'dtype', 'shape', 'data', 'ptr'))
        logger = trt.Logger(trt.Logger.INFO)
        with open(w, 'rb') as f, trt.Runtime(logger) as runtime:
            model = runtime.deserialize_cuda_engine(f.read())
        bindings = OrderedDict()
        fp16 = False  # default updated below
        for index in range(model.num_bindings):
            name = model.get_binding_name(index)
            dtype = trt.nptype(model.get_binding_dtype(index))
            shape = tuple(model.get_binding_shape(index))
            data = torch.from_numpy(np.empty(shape, dtype=np.dtype(dtype))).to(device)
            bindings[name] = Binding(name, dtype, shape, data, int(data.data_ptr()))
            if model.binding_is_input(index) and dtype == np.float16:
                fp16 = True
        binding_addrs = OrderedDict((n, d.ptr) for n, d in bindings.items())
        context = model.create_execution_context()
        batch_size = bindings['images'].shape[0]
    elif coreml:  # CoreML
        LOGGER.info(f'Loading {w} for CoreML inference...')
        import coremltools as ct
        model = ct.models.MLModel(w)
    else:  # TensorFlow (SavedModel, GraphDef, Lite, Edge TPU)
        if saved_model:  # SavedModel
            LOGGER.info(f'Loading {w} for TensorFlow SavedModel inference...')
            import tensorflow as tf
            keras = False  # assume TF1 saved_model
            model = tf.keras.models.load_model(w) if keras else tf.saved_model.load(w)
        elif pb:  # GraphDef https://www.tensorflow.org/guide/migrate#a_graphpb_or_graphpbtxt
            LOGGER.info(f'Loading {w} for TensorFlow GraphDef inference...')
            import tensorflow as tf

            def wrap_frozen_graph(gd, inputs, outputs):
                x = tf.compat.v1.wrap_function(lambda: tf.compat.v1.import_graph_def(gd, name=""), [])  # wrapped
                ge = x.graph.as_graph_element
                return x.prune(tf.nest.map_structure(ge, inputs), tf.nest.map_structure(ge, outputs))

            gd = tf.Graph().as_graph_def()  # graph_def
            with open(w, 'rb') as f:
                gd.ParseFromString(f.read())
            frozen_func = wrap_frozen_graph(gd, inputs="x:0", outputs="Identity:0")
        elif tflite or edgetpu:  # https://www.tensorflow.org/lite/guide/python#install_tensorflow_lite_for_python
            try:  # https://coral.ai/docs/edgetpu/tflite-python/#update-existing-tf-lite-code-for-the-edge-tpu
                from tflite_runtime.interpreter import Interpreter, load_delegate
            except ImportError:
                import tensorflow as tf
                Interpreter, load_delegate = tf.lite.Interpreter, tf.lite.experimental.load_delegate,
            if edgetpu:  # Edge TPU https://coral.ai/software/#edgetpu-runtime
                LOGGER.info(f'Loading {w} for TensorFlow Lite Edge TPU inference...')
                delegate = {
                    'Linux': 'libedgetpu.so.1',
                    'Darwin': 'libedgetpu.1.dylib',
                    'Windows': 'edgetpu.dll'}[platform.system()]
                interpreter = Interpreter(model_path=w, experimental_delegates=[load_delegate(delegate)])
            else:  # Lite
                LOGGER.info(f'Loading {w} for TensorFlow Lite inference...')
                interpreter = Interpreter(model_path=w)  # load TFLite model
            interpreter.allocate_tensors()  # allocate
            input_details = interpreter.get_input_details()  # inputs
            output_details = interpreter.get_output_details()  # outputs
        elif tfjs:
            raise Exception('ERROR: YOLOv5 TF.js inference is not supported')
    self.__dict__.update(locals())  # assign all variables to self

def forward(self, im, augment=False, visualize=False, val=False):
    # YOLOv5 MultiBackend inference
    b, ch, h, w = im.shape  # batch, channel, height, width
    if self.pt:  # PyTorch
        y = self.model(im, augment=augment, visualize=visualize)[0]
    elif self.jit:  # TorchScript
        y = self.model(im)[0]
    elif self.dnn:  # ONNX OpenCV DNN
        im = im.cpu().numpy()  # torch to numpy
        self.net.setInput(im)
        y = self.net.forward()
    elif self.onnx:  # ONNX Runtime
        im = im.cpu().numpy()  # torch to numpy
        y = self.session.run([self.session.get_outputs()[0].name], {self.session.get_inputs()[0].name: im})[0]
    elif self.xml:  # OpenVINO
        im = im.cpu().numpy()  # FP32
        desc = self.ie.TensorDesc(precision='FP32', dims=im.shape, layout='NCHW')  # Tensor Description
        request = self.executable_network.requests[0]  # inference request
        request.set_blob(blob_name='images', blob=self.ie.Blob(desc, im))  # name=next(iter(request.input_blobs))
        request.infer()
        y = request.output_blobs['output'].buffer  # name=next(iter(request.output_blobs))
    elif self.engine:  # TensorRT
        assert im.shape == self.bindings['images'].shape, (im.shape, self.bindings['images'].shape)
        self.binding_addrs['images'] = int(im.data_ptr())
        self.context.execute_v2(list(self.binding_addrs.values()))
        y = self.bindings['output'].data
    elif self.coreml:  # CoreML
        im = im.permute(0, 2, 3, 1).cpu().numpy()  # torch BCHW to numpy BHWC shape(1,320,192,3)
        im = Image.fromarray((im[0] * 255).astype('uint8'))
        # im = im.resize((192, 320), Image.ANTIALIAS)
        y = self.model.predict({'image': im})  # coordinates are xywh normalized
        if 'confidence' in y:
            box = xywh2xyxy(y['coordinates'] * [[w, h, w, h]])  # xyxy pixels
            conf, cls = y['confidence'].max(1), y['confidence'].argmax(1).astype(np.float)
            y = np.concatenate((box, conf.reshape(-1, 1), cls.reshape(-1, 1)), 1)
        else:
            k = 'var_' + str(sorted(int(k.replace('var_', '')) for k in y)[-1])  # output key
            y = y[k]  # output
    else:  # TensorFlow (SavedModel, GraphDef, Lite, Edge TPU)
        im = im.permute(0, 2, 3, 1).cpu().numpy()  # torch BCHW to numpy BHWC shape(1,320,192,3)
        if self.saved_model:  # SavedModel
            y = (self.model(im, training=False) if self.keras else self.model(im)).numpy()
        elif self.pb:  # GraphDef
            y = self.frozen_func(x=self.tf.constant(im)).numpy()
        else:  # Lite or Edge TPU
            input, output = self.input_details[0], self.output_details[0]
            int8 = input['dtype'] == np.uint8  # is TFLite quantized uint8 model
            if int8:
                scale, zero_point = input['quantization']
                im = (im / scale + zero_point).astype(np.uint8)  # de-scale
            self.interpreter.set_tensor(input['index'], im)
            self.interpreter.invoke()
            y = self.interpreter.get_tensor(output['index'])
            if int8:
                scale, zero_point = output['quantization']
                y = (y.astype(np.float32) - zero_point) * scale  # re-scale
        y[..., :4] *= [w, h, w, h]  # xywh normalized to pixels

    if isinstance(y, np.ndarray):
        y = torch.tensor(y, device=self.device)
    return (y, []) if val else y

def warmup(self, imgsz=(1, 3, 640, 640)):
    # Warmup model by running inference once
    if any((self.pt, self.jit, self.onnx, self.engine, self.saved_model, self.pb)):  # warmup types
        if self.device.type != 'cpu':  # only warmup GPU models
            im = torch.zeros(*imgsz, dtype=torch.half if self.fp16 else torch.float, device=self.device)  # input
            for _ in range(2 if self.jit else 1):  #
                self.forward(im)  # warmup

@staticmethod
def model_type(p='path/to/model.pt'):
    # Return model type from model path, i.e. path='path/to/model.onnx' -> type=onnx
    from export import export_formats
    suffixes = list(export_formats().Suffix) + ['.xml']  # export suffixes
    check_suffix(p, suffixes)  # checks
    p = Path(p).name  # eliminate trailing separators
    pt, jit, onnx, xml, engine, coreml, saved_model, pb, tflite, edgetpu, tfjs, xml2 = (s in p for s in suffixes)
    xml |= xml2  # *_openvino_model or *.xml
    tflite &= not edgetpu  # *.tflite
    return pt, jit, onnx, xml, engine, coreml, saved_model, pb, tflite, edgetpu, tfjs

class AutoShape(nn.Module):

YOLOv5 input-robust model wrapper for passing cv2/np/PIL/torch inputs. Includes preprocessing, inference and NMS

conf = 0.25  # NMS confidence threshold
iou = 0.45  # NMS IoU threshold
agnostic = False  # NMS class-agnostic
multi_label = False  # NMS multiple labels per box
classes = None  # (optional list) filter by class, i.e. = [0, 15, 16] for COCO persons, cats and dogs
max_det = 1000  # maximum number of detections per image
amp = False  # Automatic Mixed Precision (AMP) inference

def __init__(self, model):
    super().__init__()
    LOGGER.info('Adding AutoShape... ')
    copy_attr(self, model, include=('yaml', 'nc', 'hyp', 'names', 'stride', 'abc'), exclude=())  # copy attributes
    self.dmb = isinstance(model, DetectMultiBackend)  # DetectMultiBackend() instance
    self.pt = not self.dmb or model.pt  # PyTorch model
    self.model = model.eval()

def _apply(self, fn):
    # Apply to(), cpu(), cuda(), half() to model tensors that are not parameters or registered buffers
    self = super()._apply(fn)
    if self.pt:
        m = self.model.model.model[-1] if self.dmb else self.model.model[-1]  # Detect()
        m.stride = fn(m.stride)
        m.grid = list(map(fn, m.grid))
        if isinstance(m.anchor_grid, list):
            m.anchor_grid = list(map(fn, m.anchor_grid))
    return self

@torch.no_grad()
def forward(self, imgs, size=640, augment=False, profile=False):
    # Inference from various sources. For height=640, width=1280, RGB images example inputs are:
    #   file:       imgs = 'data/images/zidane.jpg'  # str or PosixPath
    #   URI:             = 'https://ultralytics.com/images/zidane.jpg'
    #   OpenCV:          = cv2.imread('image.jpg')[:,:,::-1]  # HWC BGR to RGB x(640,1280,3)
    #   PIL:             = Image.open('image.jpg') or ImageGrab.grab()  # HWC x(640,1280,3)
    #   numpy:           = np.zeros((640,1280,3))  # HWC
    #   torch:           = torch.zeros(16,3,320,640)  # BCHW (scaled to size=640, 0-1 values)
    #   multiple:        = [Image.open('image1.jpg'), Image.open('image2.jpg'), ...]  # list of images

    t = [time_sync()]
    p = next(self.model.parameters()) if self.pt else torch.zeros(1)  # for device and type
    autocast = self.amp and (p.device.type != 'cpu')  # Automatic Mixed Precision (AMP) inference
    if isinstance(imgs, torch.Tensor):  # torch
        with amp.autocast(autocast):
            return self.model(imgs.to(p.device).type_as(p), augment, profile)  # inference

    # Pre-process
    n, imgs = (len(imgs), list(imgs)) if isinstance(imgs, (list, tuple)) else (1, [imgs])  # number, list of images
    shape0, shape1, files = [], [], []  # image and inference shapes, filenames
    for i, im in enumerate(imgs):
        f = f'image{i}'  # filename
        if isinstance(im, (str, Path)):  # filename or uri
            im, f = Image.open(requests.get(im, stream=True).raw if str(im).startswith('http') else im), im
            im = np.asarray(exif_transpose(im))
        elif isinstance(im, Image.Image):  # PIL Image
            im, f = np.asarray(exif_transpose(im)), getattr(im, 'filename', f) or f
        files.append(Path(f).with_suffix('.jpg').name)
        if im.shape[0] < 5:  # image in CHW
            im = im.transpose((1, 2, 0))  # reverse dataloader .transpose(2, 0, 1)
        im = im[..., :3] if im.ndim == 3 else np.tile(im[..., None], 3)  # enforce 3ch input
        s = im.shape[:2]  # HWC
        shape0.append(s)  # image shape
        g = (size / max(s))  # gain
        shape1.append([y * g for y in s])
        imgs[i] = im if im.data.contiguous else np.ascontiguousarray(im)  # update
    shape1 = [make_divisible(x, self.stride) if self.pt else size for x in np.array(shape1).max(0)]  # inf shape
    x = [letterbox(im, shape1, auto=False)[0] for im in imgs]  # pad
    x = np.ascontiguousarray(np.array(x).transpose((0, 3, 1, 2)))  # stack and BHWC to BCHW
    x = torch.from_numpy(x).to(p.device).type_as(p) / 255  # uint8 to fp16/32
    t.append(time_sync())

    with amp.autocast(autocast):
        # Inference
        y = self.model(x, augment, profile)  # forward
        t.append(time_sync())

        # Post-process
        y = non_max_suppression(y if self.dmb else y[0],
                                self.conf,
                                self.iou,
                                self.classes,
                                self.agnostic,
                                self.multi_label,
                                max_det=self.max_det)  # NMS
        for i in range(n):
            scale_coords(shape1, y[i][:, :4], shape0[i])

        t.append(time_sync())
        return Detections(imgs, y, files, t, self.names, x.shape)

class Detections:

YOLOv5 detections class for inference results

def __init__(self, imgs, pred, files, times=(0, 0, 0, 0), names=None, shape=None):
    super().__init__()
    d = pred[0].device  # device
    gn = [torch.tensor([*(im.shape[i] for i in [1, 0, 1, 0]), 1, 1], device=d) for im in imgs]  # normalizations
    self.imgs = imgs  # list of images as numpy arrays
    self.pred = pred  # list of tensors pred[0] = (xyxy, conf, cls)
    self.names = names  # class names
    self.files = files  # image filenames
    self.times = times  # profiling times
    self.xyxy = pred  # xyxy pixels
    self.xywh = [xyxy2xywh(x) for x in pred]  # xywh pixels
    self.xyxyn = [x / g for x, g in zip(self.xyxy, gn)]  # xyxy normalized
    self.xywhn = [x / g for x, g in zip(self.xywh, gn)]  # xywh normalized
    self.n = len(self.pred)  # number of images (batch size)
    self.t = tuple((times[i + 1] - times[i]) * 1000 / self.n for i in range(3))  # timestamps (ms)
    self.s = shape  # inference BCHW shape

def display(self, pprint=False, show=False, save=False, crop=False, render=False, labels=True, save_dir=Path('')):
    crops = []
    for i, (im, pred) in enumerate(zip(self.imgs, self.pred)):
        s = f'image {i + 1}/{len(self.pred)}: {im.shape[0]}x{im.shape[1]} '  # string
        if pred.shape[0]:
            for c in pred[:, -1].unique():
                n = (pred[:, -1] == c).sum()  # detections per class
                s += f"{n} {self.names[int(c)]}{'s' * (n > 1)}, "  # add to string
            if show or save or render or crop:
                annotator = Annotator(im, example=str(self.names))
                for *box, conf, cls in reversed(pred):  # xyxy, confidence, class
                    label = f'{self.names[int(cls)]} {conf:.2f}'
                    if crop:
                        file = save_dir / 'crops' / self.names[int(cls)] / self.files[i] if save else None
                        crops.append({
                            'box': box,
                            'conf': conf,
                            'cls': cls,
                            'label': label,
                            'im': save_one_box(box, im, file=file, save=save)})
                    else:  # all others
                        annotator.box_label(box, label if labels else '', color=colors(cls))
                im = annotator.im
        else:
            s += '(no detections)'

        im = Image.fromarray(im.astype(np.uint8)) if isinstance(im, np.ndarray) else im  # from np
        if pprint:
            LOGGER.info(s.rstrip(', '))
        if show:
            im.show(self.files[i])  # show
        if save:
            f = self.files[i]
            im.save(save_dir / f)  # save
            if i == self.n - 1:
                LOGGER.info(f"Saved {self.n} image{'s' * (self.n > 1)} to {colorstr('bold', save_dir)}")
        if render:
            self.imgs[i] = np.asarray(im)
    if crop:
        if save:
            LOGGER.info(f'Saved results to {save_dir}\n')
        return crops

def print(self):
    self.display(pprint=True)  # print results
    LOGGER.info(f'Speed: %.1fms pre-process, %.1fms inference, %.1fms NMS per image at shape {tuple(self.s)}' %
                self.t)

def show(self, labels=True):
    self.display(show=True, labels=labels)  # show results

def save(self, labels=True, save_dir='runs/detect/exp'):
    save_dir = increment_path(save_dir, exist_ok=save_dir != 'runs/detect/exp', mkdir=True)  # increment save_dir
    self.display(save=True, labels=labels, save_dir=save_dir)  # save results

def crop(self, save=True, save_dir='runs/detect/exp'):
    save_dir = increment_path(save_dir, exist_ok=save_dir != 'runs/detect/exp', mkdir=True) if save else None
    return self.display(crop=True, save=save, save_dir=save_dir)  # crop results

def render(self, labels=True):
    self.display(render=True, labels=labels)  # render results
    return self.imgs

def pandas(self):
    # return detections as pandas DataFrames, i.e. print(results.pandas().xyxy[0])
    new = copy(self)  # return copy
    ca = 'xmin', 'ymin', 'xmax', 'ymax', 'confidence', 'class', 'name'  # xyxy columns
    cb = 'xcenter', 'ycenter', 'width', 'height', 'confidence', 'class', 'name'  # xywh columns
    for k, c in zip(['xyxy', 'xyxyn', 'xywh', 'xywhn'], [ca, ca, cb, cb]):
        a = [[x[:5] + [int(x[5]), self.names[int(x[5])]] for x in x.tolist()] for x in getattr(self, k)]  # update
        setattr(new, k, [pd.DataFrame(x, columns=c) for x in a])
    return new

def tolist(self):
    # return a list of Detections objects, i.e. 'for result in results.tolist():'
    r = range(self.n)  # iterable
    x = [Detections([self.imgs[i]], [self.pred[i]], [self.files[i]], self.times, self.names, self.s) for i in r]
    # for d in x:
    #    for k in ['imgs', 'pred', 'xyxy', 'xyxyn', 'xywh', 'xywhn']:
    #        setattr(d, k, getattr(d, k)[0])  # pop out of list
    return x

def __len__(self):
    return self.n

class Classify(nn.Module):

Classification head, i.e. x(b,c1,20,20) to x(b,c2)

def __init__(self, c1, c2, k=1, s=1, p=None, g=1):  # ch_in, ch_out, kernel, stride, padding, groups
    super().__init__()
    self.aap = nn.AdaptiveAvgPool2d(1)  # to x(b,c1,1,1)
    self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p), groups=g)  # to x(b,c2,1,1)
    self.flat = nn.Flatten()

def forward(self, x):
    z = torch.cat([self.aap(y) for y in (x if isinstance(x, list) else [x])], 1)  # cat if list
    return self.flat(self.conv(z))  # flatten to x(b,c2)

class SCConv(nn.Module): def init(self, inplanes, planes, stride, padding, dilation, groups, pooling_r, norm_layer): super(SCConv, self).init() self.k2 = nn.Sequential( nn.AvgPool2d(kernel_size=pooling_r, stride=pooling_r), nn.Conv2d(inplanes, planes, kernel_size=3, stride=1, padding=padding, dilation=dilation, groups=groups, bias=False), norm_layer(planes), ) self.k3 = nn.Sequential( nn.Conv2d(inplanes, planes, kernel_size=3, stride=1, padding=padding, dilation=dilation, groups=groups, bias=False), norm_layer(planes), ) self.k4 = nn.Sequential( nn.Conv2d(inplanes, planes, kernel_size=3, stride=stride, padding=padding, dilation=dilation, groups=groups, bias=False), norm_layer(planes), )

def forward(self, x):
    identity = x

    out = torch.sigmoid(torch.add(identity, F.interpolate(self.k2(x), identity.size()[2:]))) # sigmoid(identity + k2)
    out = torch.mul(self.k3(x), out) # k3 * sigmoid(identity + k2)
    out = self.k4(out) # k4

    return out   

输入为 [N, C, H, W],需要两个参数,in_planes为输特征通道数,K 为专家个数

class Attention(nn.Module): def init(self,in_planes,K): super().init() self.avgpool=nn.AdaptiveAvgPool2d(1) self.net=nn.Conv2d(in_planes, K, kernel_size=1) self.sigmoid=nn.Sigmoid()

def forward(self,x):
    # 将输入特征全局池化为 [N, C, 1, 1]
    att=self.avgpool(x)
    # 使用1X1卷积,转化为 [N, K, 1, 1]
    att=self.net(att)
    # 将特征转化为二维 [N, K]
    att=att.view(x.shape[0],-1) 
    # 使用 sigmoid 函数输出归一化到 [0,1] 区间
    return self.sigmoid(att)           

class CondConv(nn.Module): def init(self,in_planes,out_planes,kernel_size,stride,padding=0, groups=1,K=4): super().init() self.in_planes = in_planes self.out_planes = out_planes self.K = K self.groups = groups self.kernel_size = kernel_size self.stride = stride self.padding = padding self.attention = Attention(in_planes=in_planes,K=K) self.weight = nn.Parameter(torch.randn(K,out_planes,in_planes//groups, kernel_size,kernel_size),requires_grad=True)

def forward(self,x):
    # 调用 attention 函数得到归一化的权重 [N, K]
    N,in_planels, H, W = x.shape
    softmax_att=self.attention(x)
    # 把输入特征由 [N, C_in, H, W] 转化为 [1, N*C_in, H, W]
    x=x.view(1, -1, H, W)

    # 生成随机 weight [K, C_out, C_in/groups, 3, 3] (卷积核一般为3*3)
    # 注意添加了 requires_grad=True,这样里面的参数是可以优化的
    weight = self.weight
    # 改变 weight 形状为 [K, C_out*(C_in/groups)*3*3]
    weight = weight.view(self.K, -1) 

    # 矩阵相乘:[N, K] X [K, C_out*(C_in/groups)*3*3] = [N, C_out*(C_in/groups)*3*3]
    aggregate_weight = torch.mm(softmax_att,weight)
    # 改变形状为:[N*C_out, C_in/groups, 3, 3],即新的卷积核权重
    aggregate_weight = aggregate_weight.view(
        N*self.out_planes, self.in_planes//self.groups,
        self.kernel_size, self.kernel_size)
    # 用新生成的卷积核进行卷积,输出为 [1, N*C_out, H, W]
    output=F.conv2d(x,weight=aggregate_weight,
                    stride=self.stride, padding=self.padding,
                    groups=self.groups*N)
    # 形状恢复为 [N, C_out, H, W]        
    output=output.view(N, self.out_planes, H, W)
    return output

Additional

my dataste is about ship detection and got only one class, and i hope to change the normal conv to CondConv, and the error comes

glenn-jocher commented 2 years ago

@LeoNull101 👋 Hello! Thanks for asking about YOLOv5 🚀 dataset formatting. To train correctly your data must be in YOLOv5 format. Please see our Train Custom Data tutorial for full documentation on dataset setup and all steps required to start training your first model. A few excerpts from the tutorial:

1.1 Create dataset.yaml

COCO128 is an example small tutorial dataset composed of the first 128 images in COCO train2017. These same 128 images are used for both training and validation to verify our training pipeline is capable of overfitting. data/coco128.yaml, shown below, is the dataset config file that defines 1) the dataset root directory path and relative paths to train / val / test image directories (or *.txt files with image paths), 2) the number of classes nc and 3) a list of class names:

# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]
path: ../datasets/coco128  # dataset root dir
train: images/train2017  # train images (relative to 'path') 128 images
val: images/train2017  # val images (relative to 'path') 128 images
test:  # test images (optional)

# Classes
nc: 80  # number of classes
names: [ 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light',
         'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
         'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',
         'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
         'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',
         'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
         'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
         'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear',
         'hair drier', 'toothbrush' ]  # class names

1.2 Create Labels

After using a tool like Roboflow Annotate to label your images, export your labels to YOLO format, with one *.txt file per image (if no objects in image, no *.txt file is required). The *.txt file specifications are:

Image Labels

The label file corresponding to the above image contains 2 persons (class 0) and a tie (class 27):

1.3 Organize Directories

Organize your train and val images and labels according to the example below. YOLOv5 assumes /coco128 is inside a /datasets directory next to the /yolov5 directory. YOLOv5 locates labels automatically for each image by replacing the last instance of /images/ in each image path with /labels/. For example:

../datasets/coco128/images/im0.jpg  # image
../datasets/coco128/labels/im0.txt  # label

Good luck 🍀 and let us know if you have any other questions!

github-actions[bot] commented 2 years ago

👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.

Access additional YOLOv5 🚀 resources:

Access additional Ultralytics ⚡ resources:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐!