[GSoC] Add siamrpnpp.py

opencv / opencv

Open Source Computer Vision Library

https://opencv.org

Apache License 2.0

75.95k stars 55.62k forks source link

[GSoC] Add siamrpnpp.py #17647

Closed jinyup100 closed 3 years ago

jinyup100 commented 3 years ago

GSoC '20 : Real-time Single Object Tracking using Deep Learning (SiamRPN++)

Overview

Proposal : https://summerofcode.withgoogle.com/projects/#4979746967912448 Mentors : Liubov Batanina @l-bat, Stefano Fabri @bhack, Ilya Elizarov @ieliz Student : Jin Yeob Chung @jinyup100

Details of the Pull Request

Export of the torch implementation of the SiamRPN++ visual tracker to ONNX
- Please refer to (https://gist.github.com/jinyup100/7aa748686c5e234ed6780154141b4685) or Code to generate ONNX models at the bottom of this PR description
Addition of siamrpnpp.py in the opencv/samples/dnn repository
- SiamRPN++ visual tracker can be performed on a sample video input
- Parsers include:
  - --input_video path to sample video input
  - --target_net path to target branch of the visual tracker
  - --search_net path to search branch of the visual tracker
  - --rpn_head path to head of the visual tracker
  - --backend selection of the computation backend
  - --target selection of the computation target device
Additional samples of the visual tracker performed on videos are available at:
- https://drive.google.com/drive/folders/1k7Z_SHaBWK_4aEQPxJJCGm3P7y2IFCjY?usp=sharing

Examples

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

[X] I agree to contribute to the project under OpenCV (BSD) License.
[X] To the best of my knowledge, the proposed patch is not based on a code under GPL or other license that is incompatible with OpenCV
[X] The PR is proposed to proper branch
[X] There is reference to original bug report and related work
[X] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name.
[X] The feature is well documented and sample code can be built with the project CMake

Code to generate ONNX Models

The code shown below to generate the ONNX models of siamrpn++ is also available from : https://gist.github.com/jinyup100/7aa748686c5e234ed6780154141b4685 ![ball_track](https://user-images.githubusercontent.com/41290732/91156436-1dd88700-e6ff-11ea-85a3-db0f668e5eee.gif) The Final Version of the Pre-Trained Weights and successfully converted ONNX format of the models using the codes are available at:: **Pre-Trained Weights in pth Format** https://drive.google.com/file/d/11bwgPFVkps9AH2NOD1zBDdpF_tQghAB-/view?usp=sharing **Target Net** : Import :heavy_check_mark: Export :heavy_check_mark: https://drive.google.com/file/d/1dw_Ne3UMcCnFsaD6xkZepwE4GEpqq7U_/view?usp=sharing **Search Net** : Import :heavy_check_mark: Export :heavy_check_mark: https://drive.google.com/file/d/1Lt4oE43ZSucJvze3Y-Z87CVDreO-Afwl/view?usp=sharing **RPN_head** : Import : :heavy_check_mark: Export :heavy_check_mark: https://drive.google.com/file/d/1zT1yu12mtj3JQEkkfKFJWiZ71fJ-dQTi/view?usp=sharing ```python import numpy as np import onnx import torch import torch.nn as nn # Class for the Building Blocks required for ResNet class Bottleneck(nn.Module): expansion = 4 def __init__(self, inplanes, planes, stride=1, downsample=None, dilation=1): super(Bottleneck, self).__init__() self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False) self.bn1 = nn.BatchNorm2d(planes) padding = 2 - stride if downsample is not None and dilation > 1: dilation = dilation // 2 padding = dilation assert stride == 1 or dilation == 1, \ "stride and dilation must have one equals to zero at least" if dilation > 1: padding = dilation self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride, padding=padding, bias=False, dilation=dilation) self.bn2 = nn.BatchNorm2d(planes) self.conv3 = nn.Conv2d(planes, planes * 4, kernel_size=1, bias=False) self.bn3 = nn.BatchNorm2d(planes * 4) self.relu = nn.ReLU(inplace=True) self.downsample = downsample self.stride = stride def forward(self, x): residual = x out = self.conv1(x) out = self.bn1(out) out = self.relu(out) out = self.conv2(out) out = self.bn2(out) out = self.relu(out) out = self.conv3(out) out = self.bn3(out) if self.downsample is not None: residual = self.downsample(x) out += residual out = self.relu(out) return out # End of Building Blocks # Class for ResNet - the Backbone neural network class ResNet(nn.Module): "ResNET" def __init__(self, block, layers, used_layers): self.inplanes = 64 super(ResNet, self).__init__() self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=0, # 3 bias=False) self.bn1 = nn.BatchNorm2d(64) self.relu = nn.ReLU(inplace=True) self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1) self.layer1 = self._make_layer(block, 64, layers[0]) self.layer2 = self._make_layer(block, 128, layers[1], stride=2) self.feature_size = 128 * block.expansion self.used_layers = used_layers layer3 = True if 3 in used_layers else False layer4 = True if 4 in used_layers else False if layer3: self.layer3 = self._make_layer(block, 256, layers[2], stride=1, dilation=2) # 15x15, 7x7 self.feature_size = (256 + 128) * block.expansion else: self.layer3 = lambda x: x # identity if layer4: self.layer4 = self._make_layer(block, 512, layers[3], stride=1, dilation=4) # 7x7, 3x3 self.feature_size = 512 * block.expansion else: self.layer4 = lambda x: x # identity for m in self.modules(): if isinstance(m, nn.Conv2d): n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels m.weight.data.normal_(0, np.sqrt(2. / n)) elif isinstance(m, nn.BatchNorm2d): m.weight.data.fill_(1) m.bias.data.zero_() def _make_layer(self, block, planes, blocks, stride=1, dilation=1): downsample = None dd = dilation if stride != 1 or self.inplanes != planes * block.expansion: if stride == 1 and dilation == 1: downsample = nn.Sequential( nn.Conv2d(self.inplanes, planes * block.expansion, kernel_size=1, stride=stride, bias=False), nn.BatchNorm2d(planes * block.expansion), ) else: if dilation > 1: dd = dilation // 2 padding = dd else: dd = 1 padding = 0 downsample = nn.Sequential( nn.Conv2d(self.inplanes, planes * block.expansion, kernel_size=3, stride=stride, bias=False, padding=padding, dilation=dd), nn.BatchNorm2d(planes * block.expansion), ) layers = [] layers.append(block(self.inplanes, planes, stride, downsample, dilation=dilation)) self.inplanes = planes * block.expansion for i in range(1, blocks): layers.append(block(self.inplanes, planes, dilation=dilation)) return nn.Sequential(*layers) def forward(self, x): x = self.conv1(x) x = self.bn1(x) x_ = self.relu(x) x = self.maxpool(x_) p1 = self.layer1(x) p2 = self.layer2(p1) p3 = self.layer3(p2) p4 = self.layer4(p3) out = [x_, p1, p2, p3, p4] out = [out[i] for i in self.used_layers] if len(out) == 1: return out[0] else: return out # End of ResNet # Class for Adjusting the layers of the neural net class AdjustLayer_1(nn.Module): def __init__(self, in_channels, out_channels, center_size=7): super(AdjustLayer_1, self).__init__() self.downsample = nn.Sequential( nn.Conv2d(in_channels, out_channels, kernel_size=1, bias=False), nn.BatchNorm2d(out_channels), ) self.center_size = center_size def forward(self, x): x = self.downsample(x) l = 4 r = 11 x = x[:, :, l:r, l:r] return x class AdjustAllLayer_1(nn.Module): def __init__(self, in_channels, out_channels, center_size=7): super(AdjustAllLayer_1, self).__init__() self.num = len(out_channels) if self.num == 1: self.downsample = AdjustLayer_1(in_channels[0], out_channels[0], center_size) else: for i in range(self.num): self.add_module('downsample'+str(i+2), AdjustLayer_1(in_channels[i], out_channels[i], center_size)) def forward(self, features): if self.num == 1: return self.downsample(features) else: out = [] for i in range(self.num): adj_layer = getattr(self, 'downsample'+str(i+2)) out.append(adj_layer(features[i])) return out class AdjustLayer_2(nn.Module): def __init__(self, in_channels, out_channels, center_size=7): super(AdjustLayer_2, self).__init__() self.downsample = nn.Sequential( nn.Conv2d(in_channels, out_channels, kernel_size=1, bias=False), nn.BatchNorm2d(out_channels), ) self.center_size = center_size def forward(self, x): x = self.downsample(x) return x class AdjustAllLayer_2(nn.Module): def __init__(self, in_channels, out_channels, center_size=7): super(AdjustAllLayer_2, self).__init__() self.num = len(out_channels) if self.num == 1: self.downsample = AdjustLayer_2(in_channels[0], out_channels[0], center_size) else: for i in range(self.num): self.add_module('downsample'+str(i+2), AdjustLayer_2(in_channels[i], out_channels[i], center_size)) def forward(self, features): if self.num == 1: return self.downsample(features) else: out = [] for i in range(self.num): adj_layer = getattr(self, 'downsample'+str(i+2)) out.append(adj_layer(features[i])) return out # End of Class for Adjusting the layers of the neural net # Class for Region Proposal Neural Network class RPN(nn.Module): "Region Proposal Network" def __init__(self): super(RPN, self).__init__() def forward(self, z_f, x_f): raise NotImplementedError class DepthwiseXCorr(nn.Module): "Depthwise Correlation Layer" def __init__(self, in_channels, hidden, out_channels, kernel_size=3, hidden_kernel_size=5): super(DepthwiseXCorr, self).__init__() self.conv_kernel = nn.Sequential( nn.Conv2d(in_channels, hidden, kernel_size=kernel_size, bias=False), nn.BatchNorm2d(hidden), nn.ReLU(inplace=True), ) self.conv_search = nn.Sequential( nn.Conv2d(in_channels, hidden, kernel_size=kernel_size, bias=False), nn.BatchNorm2d(hidden), nn.ReLU(inplace=True), ) self.head = nn.Sequential( nn.Conv2d(hidden, hidden, kernel_size=1, bias=False), nn.BatchNorm2d(hidden), nn.ReLU(inplace=True), nn.Conv2d(hidden, out_channels, kernel_size=1) ) def forward(self, kernel, search): kernel = self.conv_kernel(kernel) search = self.conv_search(search) feature = xcorr_depthwise(search, kernel) out = self.head(feature) return out class DepthwiseRPN(RPN): def __init__(self, anchor_num=5, in_channels=256, out_channels=256): super(DepthwiseRPN, self).__init__() self.cls = DepthwiseXCorr(in_channels, out_channels, 2 * anchor_num) self.loc = DepthwiseXCorr(in_channels, out_channels, 4 * anchor_num) def forward(self, z_f, x_f): cls = self.cls(z_f, x_f) loc = self.loc(z_f, x_f) return cls, loc class MultiRPN(RPN): def __init__(self, anchor_num, in_channels): super(MultiRPN, self).__init__() for i in range(len(in_channels)): self.add_module('rpn'+str(i+2), DepthwiseRPN(anchor_num, in_channels[i], in_channels[i])) self.weight_cls = nn.Parameter(torch.Tensor([0.38156851768108546, 0.4364767608115956, 0.18195472150731892])) self.weight_loc = nn.Parameter(torch.Tensor([0.17644893463361863, 0.16564198028417967, 0.6579090850822015])) def forward(self, z_fs, x_fs): cls = [] loc = [] rpn2 = self.rpn2 z_f2 = z_fs[0] x_f2 = x_fs[0] c2,l2 = rpn2(z_f2, x_f2) cls.append(c2) loc.append(l2) rpn3 = self.rpn3 z_f3 = z_fs[1] x_f3 = x_fs[1] c3,l3 = rpn3(z_f3, x_f3) cls.append(c3) loc.append(l3) rpn4 = self.rpn4 z_f4 = z_fs[2] x_f4 = x_fs[2] c4,l4 = rpn4(z_f4, x_f4) cls.append(c4) loc.append(l4) def avg(lst): return sum(lst) / len(lst) def weighted_avg(lst, weight): s = 0 fixed_len = 3 for i in range(3): s += lst[i] * weight[i] return s return weighted_avg(cls, self.weight_cls), weighted_avg(loc, self.weight_loc) # End of class for RPN def conv3x3(in_planes, out_planes, stride=1, dilation=1): "3x3 convolution with padding" return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride, padding=dilation, bias=False, dilation=dilation) def xcorr_depthwise(x, kernel): """ Deptwise convolution for input and weights with different shapes """ batch = kernel.size(0) channel = kernel.size(1) x = x.view(1, batch*channel, x.size(2), x.size(3)) kernel = kernel.view(batch*channel, 1, kernel.size(2), kernel.size(3)) conv = nn.Conv2d(batch*channel, batch*channel, kernel_size=(kernel.size(2), kernel.size(3)), bias=False, groups=batch*channel) conv.weight = nn.Parameter(kernel) out = conv(x) out = out.view(batch, channel, out.size(2), out.size(3)) out = out.detach() return out class TargetNetBuilder(nn.Module): def __init__(self): super(TargetNetBuilder, self).__init__() # Build Backbone Model self.backbone = ResNet(Bottleneck, [3,4,6,3], [2,3,4]) # Build Neck Model self.neck = AdjustAllLayer_1([512,1024,2048], [256,256,256]) def forward(self, frame): features = self.backbone(frame) output = self.neck(features) return output class SearchNetBuilder(nn.Module): def __init__(self): super(SearchNetBuilder, self).__init__() # Build Backbone Model self.backbone = ResNet(Bottleneck, [3,4,6,3], [2,3,4]) # Build Neck Model self.neck = AdjustAllLayer_2([512,1024,2048], [256,256,256]) def forward(self, frame): features = self.backbone(frame) output = self.neck(features) return output class RPNBuilder(nn.Module): def __init__(self): super(RPNBuilder, self).__init__() # Build Adjusted Layer Builder self.rpn_head = MultiRPN(anchor_num=5,in_channels=[256, 256, 256]) def forward(self, zf, xf): # Get Feature cls, loc = self.rpn_head(zf, xf) return cls, loc """Load path should be the directory of the pre-trained siamrpn_r50_l234_dwxcorr.pth The download link to siamrpn_r50_l234_dwxcorr.pth is shown in the description""" current_path = os.getcwd() load_path = os.path.join(current_path, "siamrpn_r50_l234_dwxcorr.pth") pretrained_dict = torch.load(load_path,map_location=torch.device('cpu') ) pretrained_dict_backbone = pretrained_dict pretrained_dict_neck_1 = pretrained_dict pretrained_dict_neck_2 = pretrained_dict pretrained_dict_head = pretrained_dict pretrained_dict_target = pretrained_dict pretrained_dict_search = pretrained_dict # The shape of the inputs to the Target Network and the Search Network target = torch.Tensor(np.random.rand(1,3,127,127)) search = torch.Tensor(np.random.rand(1,3,125,125)) # Build the torch backbone model target_net = TargetNetBuilder() target_net.eval() target_net.state_dict().keys() target_net_dict = target_net.state_dict() # Load the pre-trained weight to the torch target net model pretrained_dict_target = {k: v for k, v in pretrained_dict_target.items() if k in target_net_dict} target_net_dict.update(pretrained_dict_target) target_net.load_state_dict(target_net_dict) # Export the torch target net model to ONNX model torch.onnx.export(target_net, torch.Tensor(target), "target_net.onnx", export_params=True, opset_version=11, do_constant_folding=True, input_names=['input'], output_names=['output_1,', 'output_2', 'output_3']) # Load the saved torch target net model using ONNX onnx_target = onnx.load("target_net.onnx") # Check whether the ONNX target net model has been successfully imported onnx.checker.check_model(onnx_target) print(onnx.checker.check_model(onnx_target)) onnx.helper.printable_graph(onnx_target.graph) print(onnx.helper.printable_graph(onnx_target.graph)) # Build the torch backbone model search_net = SearchNetBuilder() search_net.eval() search_net.state_dict().keys() search_net_dict = search_net.state_dict() # Load the pre-trained weight to the torch target net model pretrained_dict_search = {k: v for k, v in pretrained_dict_search.items() if k in search_net_dict} search_net_dict.update(pretrained_dict_search) search_net.load_state_dict(search_net_dict) # Export the torch target net model to ONNX model torch.onnx.export(search_net, torch.Tensor(search), "search_net.onnx", export_params=True, opset_version=11, do_constant_folding=True, input_names=['input'], output_names=['output_1,', 'output_2', 'output_3']) # Load the saved torch target net model using ONNX onnx_search = onnx.load("search_net.onnx") # Check whether the ONNX target net model has been successfully imported onnx.checker.check_model(onnx_search) print(onnx.checker.check_model(onnx_search)) onnx.helper.printable_graph(onnx_search.graph) print(onnx.helper.printable_graph(onnx_search.graph)) # Outputs from the Target Net and Search Net zfs_1, zfs_2, zfs_3 = target_net(torch.Tensor(target)) xfs_1, xfs_2, xfs_3 = search_net(torch.Tensor(search)) # Adjustments to the outputs from each of the neck models to match to input shape of the torch rpn_head model zfs = np.stack([zfs_1.detach().numpy(), zfs_2.detach().numpy(), zfs_3.detach().numpy()]) xfs = np.stack([xfs_1.detach().numpy(), xfs_2.detach().numpy(), xfs_3.detach().numpy()]) # Build the torch rpn_head model rpn_head = RPNBuilder() rpn_head.eval() rpn_head.state_dict().keys() rpn_head_dict = rpn_head.state_dict() # Load the pre-trained weights to the rpn_head model pretrained_dict_head = {k: v for k, v in pretrained_dict_head.items() if k in rpn_head_dict} pretrained_dict_head.keys() rpn_head_dict.update(pretrained_dict_head) rpn_head.load_state_dict(rpn_head_dict) rpn_head.eval() # Export the torch rpn_head model to ONNX model torch.onnx.export(rpn_head, (torch.Tensor(np.random.rand(*zfs.shape)), torch.Tensor(np.random.rand(*xfs.shape))), "rpn_head.onnx", export_params=True, opset_version=11, do_constant_folding=True, input_names = ['input_1', 'input_2'], output_names = ['output_1', 'output_2']) # Load the saved rpn_head model using ONNX onnx_rpn_head_model = onnx.load("rpn_head.onnx") # Check whether the rpn_head model has been successfully imported onnx.checker.check_model(onnx_rpn_head_model) print(onnx.checker.check_model(onnx_rpn_head_model)) onnx.helper.printable_graph(onnx_rpn_head_model.graph) print(onnx.helper.printable_graph(onnx_rpn_head_model.graph)) ```

jinyup100 commented 3 years ago

Currently at a stage where pytorch model of SiamRPN++ has been made and where conversion from pytorch to ONNX is made.

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals

import math
import os
import onnx
import torch
import torch.nn as nn
import torch.nn.functional as F

from torch.autograd import Variable

# Class for the Tracker - SiamRPNPP

class Tracker(nn.Module):
    def __init__(self):
        super(Tracker, self).__init__()

        # build backbone
        self.backbone = resnet50([2,3,4])

        # build adjust layer
        self.neck = AdjustAllLayer([512, 1024, 2048], [256, 256, 256])

        # build rpn head
        self.rpn_head = MultiRPN(anchor_num=5,in_channels=[256, 256, 256],weighted=True)

    def template(self, z):
        zf = self.backbone(z)
        zf = self.neck(zf)
        self.zf = zf

    def track(self, x):
        xf = self.backbone(x)
        xf = self.neck(xf)
        cls, loc = self.rpn_head(self.zf, xf)
        return {'cls': cls,'loc': loc,}

    def log_softmax(self, cls):
        b, a2, h, w = cls.size()
        cls = cls.view(b, 2, a2//2, h, w)
        cls = cls.permute(0, 2, 3, 4, 1).contiguous()
        cls = F.log_softmax(cls, dim=4)
        return cls

    def forward(self, data):
        """ only used in training
        """
        template = data['template']
        search = data['search']
        #label_cls = data['label_cls']
        #label_loc = data['label_loc']
        #label_loc_weight = data['label_loc_weight']

        # get feature
        zf = self.backbone(template)
        xf = self.backbone(search)

        zf = self.neck(zf)
        xf = self.neck(xf)
        cls, loc = self.rpn_head(zf, xf)

        # get loss
        #cls = self.log_softmax(cls)
        #cls_loss = select_cross_entropy_loss(cls, label_cls)
        #loc_loss = weight_l1_loss(loc, label_loc, label_loc_weight)

        #outputs = {}
        #outputs['total_loss'] = cfg.TRAIN.CLS_WEIGHT * cls_loss + \
            #cfg.TRAIN.LOC_WEIGHT * loc_loss
        #outputs['cls_loss'] = cls_loss
        #outputs['loc_loss'] = loc_loss

        return cls, loc

# End of Tracker - SiamRPNPP

# Class for the Building Blocks required for ResNet

class BasicBlock(nn.Module):
    expansion = 1

    def __init__(self, inplanes, planes, stride=1,downsample=None, dilation=1):
        super(BasicBlock, self).__init__()
        padding = 2 - stride

        if dilation > 1:
            padding = dilation

        dd = dilation
        pad = padding
        if downsample is not None and dilation > 1:
            dd = dilation // 2
            pad = dd

        self.conv1 = nn.Conv2d(inplanes, planes,
                               stride=stride, dilation=dd, bias=False,
                               kernel_size=3, padding=pad)
        self.bn1 = nn.BatchNorm2d(planes)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = conv3x3(planes, planes, dilation=dilation)
        self.bn2 = nn.BatchNorm2d(planes)
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        residual = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)

        if self.downsample is not None:
            residual = self.downsample(x)

        out += residual
        out = self.relu(out)

        return out

class Bottleneck(nn.Module):
    expansion = 4

    def __init__(self, inplanes, planes, stride=1,
                 downsample=None, dilation=1):
        super(Bottleneck, self).__init__()
        self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False)
        self.bn1 = nn.BatchNorm2d(planes)
        padding = 2 - stride
        if downsample is not None and dilation > 1:
            dilation = dilation // 2
            padding = dilation

        assert stride == 1 or dilation == 1, \
            "stride and dilation must have one equals to zero at least"

        if dilation > 1:
            padding = dilation
        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride,
                               padding=padding, bias=False, dilation=dilation)
        self.bn2 = nn.BatchNorm2d(planes)
        self.conv3 = nn.Conv2d(planes, planes * 4, kernel_size=1, bias=False)
        self.bn3 = nn.BatchNorm2d(planes * 4)
        self.relu = nn.ReLU(inplace=True)
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        residual = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)

        out = self.conv3(out)
        out = self.bn3(out)

        if self.downsample is not None:
            residual = self.downsample(x)

        out += residual

        out = self.relu(out)

        return out

# End of Building Blocks

# Class for ResNet - the Backbone neural network

class ResNet(nn.Module):
    def __init__(self, block, layers, used_layers):
        self.inplanes = 64
        super(ResNet, self).__init__()
        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=0,  # 3
                               bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.layer1 = self._make_layer(block, 64, layers[0])
        self.layer2 = self._make_layer(block, 128, layers[1], stride=2)

        self.feature_size = 128 * block.expansion
        self.used_layers = used_layers
        layer3 = True if 3 in used_layers else False
        layer4 = True if 4 in used_layers else False

        if layer3:
            self.layer3 = self._make_layer(block, 256, layers[2],
                                           stride=1, dilation=2)  # 15x15, 7x7
            self.feature_size = (256 + 128) * block.expansion
        else:
            self.layer3 = lambda x: x  # identity

        if layer4:
            self.layer4 = self._make_layer(block, 512, layers[3],
                                           stride=1, dilation=4)  # 7x7, 3x3
            self.feature_size = 512 * block.expansion
        else:
            self.layer4 = lambda x: x  # identity

        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
                m.weight.data.normal_(0, math.sqrt(2. / n))
            elif isinstance(m, nn.BatchNorm2d):
                m.weight.data.fill_(1)
                m.bias.data.zero_()

    def _make_layer(self, block, planes, blocks, stride=1, dilation=1):
        downsample = None
        dd = dilation
        if stride != 1 or self.inplanes != planes * block.expansion:
            if stride == 1 and dilation == 1:
                downsample = nn.Sequential(
                    nn.Conv2d(self.inplanes, planes * block.expansion,
                              kernel_size=1, stride=stride, bias=False),
                    nn.BatchNorm2d(planes * block.expansion),
                )
            else:
                if dilation > 1:
                    dd = dilation // 2
                    padding = dd
                else:
                    dd = 1
                    padding = 0
                downsample = nn.Sequential(
                    nn.Conv2d(self.inplanes, planes * block.expansion,
                              kernel_size=3, stride=stride, bias=False,
                              padding=padding, dilation=dd),
                    nn.BatchNorm2d(planes * block.expansion),
                )

        layers = []
        layers.append(block(self.inplanes, planes, stride,
                            downsample, dilation=dilation))
        self.inplanes = planes * block.expansion
        for i in range(1, blocks):
            layers.append(block(self.inplanes, planes, dilation=dilation))

        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x_ = self.relu(x)
        x = self.maxpool(x_)

        p1 = self.layer1(x)
        p2 = self.layer2(p1)
        p3 = self.layer3(p2)
        p4 = self.layer4(p3)
        out = [x_, p1, p2, p3, p4]
        out = [out[i] for i in self.used_layers]
        if len(out) == 1:
            return out[0]
        else:
            return out

# End of ResNet

# Class for Adjusting the layers of the neural net

class AdjustLayer(nn.Module):
    def __init__(self, in_channels, out_channels, center_size=7):
        super(AdjustLayer, self).__init__()
        self.downsample = nn.Sequential(
            nn.Conv2d(in_channels, out_channels, kernel_size=1, bias=False),
            nn.BatchNorm2d(out_channels),
            )
        self.center_size = center_size

    def forward(self, x):
        x = self.downsample(x)
        if x.size(3) < 20:
            l = (x.size(3) - self.center_size) // 2
            r = l + self.center_size
            x = x[:, :, l:r, l:r]
        return x

class AdjustAllLayer(nn.Module):
    def __init__(self, in_channels, out_channels, center_size=7):
        super(AdjustAllLayer, self).__init__()
        self.num = len(out_channels)
        if self.num == 1:
            self.downsample = AdjustLayer(in_channels[0],
                                          out_channels[0],
                                          center_size)
        else:
            for i in range(self.num):
                self.add_module('downsample'+str(i+2),
                                AdjustLayer(in_channels[i],
                                            out_channels[i],
                                            center_size))

    def forward(self, features):
        if self.num == 1:
            return self.downsample(features)
        else:
            out = []
            for i in range(self.num):
                adj_layer = getattr(self, 'downsample'+str(i+2))
                out.append(adj_layer(features[i]))
            return out        

# End of Class for Adjusting the layers of the neural net

# Class for Region Proposal Neural Network

class RPN(nn.Module):
    def __init__(self):
        super(RPN, self).__init__()

    def forward(self, z_f, x_f):
        raise NotImplementedError

class DepthwiseXCorr(nn.Module):
    def __init__(self, in_channels, hidden, out_channels, kernel_size=3, hidden_kernel_size=5):
        super(DepthwiseXCorr, self).__init__()
        self.conv_kernel = nn.Sequential(
                nn.Conv2d(in_channels, hidden, kernel_size=kernel_size, bias=False),
                nn.BatchNorm2d(hidden),
                nn.ReLU(inplace=True),
                )
        self.conv_search = nn.Sequential(
                nn.Conv2d(in_channels, hidden, kernel_size=kernel_size, bias=False),
                nn.BatchNorm2d(hidden),
                nn.ReLU(inplace=True),
                )
        self.head = nn.Sequential(
                nn.Conv2d(hidden, hidden, kernel_size=1, bias=False),
                nn.BatchNorm2d(hidden),
                nn.ReLU(inplace=True),
                nn.Conv2d(hidden, out_channels, kernel_size=1)
                )

    def forward(self, kernel, search):
        kernel = self.conv_kernel(kernel)
        search = self.conv_search(search)
        feature = xcorr_depthwise(search, kernel)
        out = self.head(feature)
        return out

class DepthwiseRPN(RPN):
    def __init__(self, anchor_num=5, in_channels=256, out_channels=256):
        super(DepthwiseRPN, self).__init__()
        self.cls = DepthwiseXCorr(in_channels, out_channels, 2 * anchor_num)
        self.loc = DepthwiseXCorr(in_channels, out_channels, 4 * anchor_num)

    def forward(self, z_f, x_f):
        cls = self.cls(z_f, x_f)
        loc = self.loc(z_f, x_f)
        return cls, loc

class MultiRPN(RPN):
    def __init__(self, anchor_num, in_channels, weighted=False):
        super(MultiRPN, self).__init__()
        self.weighted = weighted
        for i in range(len(in_channels)):
            self.add_module('rpn'+str(i+2),
                    DepthwiseRPN(anchor_num, in_channels[i], in_channels[i]))
        if self.weighted:
            self.cls_weight = nn.Parameter(torch.ones(len(in_channels)))
            self.loc_weight = nn.Parameter(torch.ones(len(in_channels)))

    def forward(self, z_fs, x_fs):
        cls = []
        loc = []
        for idx, (z_f, x_f) in enumerate(zip(z_fs, x_fs), start=2):
            rpn = getattr(self, 'rpn'+str(idx))
            c, l = rpn(z_f, x_f)
            cls.append(c)
            loc.append(l)

        if self.weighted:
            cls_weight = F.softmax(self.cls_weight, 0)
            loc_weight = F.softmax(self.loc_weight, 0)

        def avg(lst):
            return sum(lst) / len(lst)

        def weighted_avg(lst, weight):
            s = 0
            for i in range(len(weight)):
                s += lst[i] * weight[i]
            return s

        if self.weighted:
            return weighted_avg(cls, cls_weight), weighted_avg(loc, loc_weight)
        else:
            return avg(cls), avg(loc)

# End of class for RPN

def conv3x3(in_planes, out_planes, stride=1, dilation=1):
    "3x3 convolution with padding"
    return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,
                     padding=dilation, bias=False, dilation=dilation)

def xcorr_depthwise(x, kernel):
    """depthwise cross correlation
    """
    batch = kernel.size(0)
    channel = kernel.size(1)
    x = x.view(1, batch*channel, x.size(2), x.size(3))
    kernel = kernel.view(batch*channel, 1, kernel.size(2), kernel.size(3))
    out = F.conv2d(x, kernel, groups=batch*channel)
    out = out.view(batch, channel, out.size(2), out.size(3))
    return out

def resnet18(used_layers):
    """Constructs a ResNet-18 model.
    """
    model = ResNet(BasicBlock, [2, 2, 2, 2], used_layers)
    return model

def resnet50(used_layers):
    """Constructs a ResNet-50 model.
    """
    model = ResNet(Bottleneck, [3, 4, 6, 3], used_layers)
    return model

# Buiid Tracker
tracker_model = Tracker()
tracker_model.eval()
tracker_model.state_dict().keys()
model_dict = tracker_model.state_dict()

# Load pre-trained weights

# Pre-trained weights for siamrpn_r50_l234_dwxcorr is available from:
# https://drive.google.com/drive/folders/1Q4-1563iPwV6wSf_lBHDj5CPFiGSlEPG

# Other pre-trained weights is available from:
# https://github.com/STVIR/pysot/blob/master/MODEL_ZOO.md

current_path = os.getcwd()
load_path = os.path.join(current_path, "siamrpn_r50_l234_dwxcorr.pth")
pretrained_dict = torch.load(load_path,map_location=torch.device('cpu') )
#pretrained_dict = torch.load(load_path)
pretrained_dict.keys()

pretrained_dict = {k: v for k, v in pretrained_dict.items() if k in model_dict}
model_dict.update(pretrained_dict)
tracker_model.load_state_dict(model_dict)

# Dummy Input Variables
y = {}
y['template'] = Variable(torch.randn(1, 3, 126, 126), requires_grad=True)
y['search'] = Variable(torch.randn(1, 3, 224, 224), requires_grad=True)

torch_out = tracker_model(y)

# Export the torch tracker model to ONNX model
batch_size = 1
torch.onnx.export(tracker_model, y, "siamrpnpp.onnx", export_params=True, opset_version=10,
                  do_constant_folding=True, input_names = ['input'], output_names = ['output'],
                  dynamic_axes={'input' : {0 : 'batch_size'}, 'output' : {0 : 'batch_size'}})

# Load the saved tracker model using ONNX
onnx_model = onnx.load("siamrpnpp.onnx")

# Check whether the tracker model has been successfully imported
onnx.checker.check_model(onnx_model)
onnx.helper.printable_graph(onnx_model.graph)

l-bat commented 3 years ago

I think this PR should go into 3.4

l-bat commented 3 years ago

Let's convert to a draft of this PR until ONNX models are provided

jinyup100 commented 3 years ago

My latest attempt to convert the pytorch model of SiamRPN++ to ONNX format is shown below. I have sucessfully converted the backbone model (ResNet50), the neck model (Adjusted Layers) but I am still having to convert the head model (RPN) from pytorch to ONNX.


import cv2
import math
import numpy as np
import os
import onnx
import torch
import torch.nn as nn
import torch.nn.functional as F

from torch.autograd import Variable

#@torch.jit.script
#def slice_helper(x,l_offset : int, u_offset : int):
#    return x[:,:,l_offset:u_offset,l_offset:u_offset]

@torch.jit.script
def torch_conv2d(x, kernel, batch : int, channel : int):
    groups = batch*channel
    out = F.conv2d(input=x, weight=kernel, groups=batch*channel)
    return out

# Class for the Tracker - SiamRPNPP

class Tracker(nn.Module):
    def __init__(self):
        super(Tracker, self).__init__()

        # build backbone
        self.backbone = resnet50([2,3,4])

        # build adjust layer
        self.neck = AdjustAllLayer([512, 1024, 2048], [256, 256, 256])

        # build rpn head
        self.rpn_head = MultiRPN(anchor_num=5,in_channels=[256, 256, 256],weighted=True)

    def template(self, z):
        zf = self.backbone(z)
        zf = self.neck(zf)
        self.zf = zf

    def track(self, x):
        xf = self.backbone(x)
        xf = self.neck(xf)
        cls, loc = self.rpn_head(self.zf, xf)
        return {'cls': cls,'loc': loc,}

    def log_softmax(self, cls):
        b, a2, h, w = cls.size()
        cls = cls.view(b, 2, a2//2, h, w)
        cls = cls.permute(0, 2, 3, 4, 1).contiguous()
        cls = F.log_softmax(cls, dim=4)
        return cls

    def forward(self, data):
        """ only used in training
        """
        template = data[0]
        search = data[1]
        #label_cls = data['label_cls']
        #label_loc = data['label_loc']
        #label_loc_weight = data['label_loc_weight']

        # get feature
        zf = self.backbone(template)
        xf = self.backbone(search)

        zf = self.neck(zf)
        xf = self.neck(xf)
        cls, loc = self.rpn_head(zf, xf)

        # get loss
        #cls = self.log_softmax(cls)
        #cls_loss = select_cross_entropy_loss(cls, label_cls)
        #loc_loss = weight_l1_loss(loc, label_loc, label_loc_weight)

        #outputs = {}
        #outputs['total_loss'] = cfg.TRAIN.CLS_WEIGHT * cls_loss + \
            #cfg.TRAIN.LOC_WEIGHT * loc_loss
        #outputs['cls_loss'] = cls_loss
        #outputs['loc_loss'] = loc_loss

        return cls, loc

# End of Tracker - SiamRPNPP

# Class for the Building Blocks required for ResNet

class BasicBlock(nn.Module):
    expansion = 1

    def __init__(self, inplanes, planes, stride=1,downsample=None, dilation=1):
        super(BasicBlock, self).__init__()
        padding = 2 - stride

        if dilation > 1:
            padding = dilation

        dd = dilation
        pad = padding
        if downsample is not None and dilation > 1:
            dd = dilation // 2
            pad = dd

        self.conv1 = nn.Conv2d(inplanes, planes,
                               stride=stride, dilation=dd, bias=False,
                               kernel_size=3, padding=pad)
        self.bn1 = nn.BatchNorm2d(planes)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = conv3x3(planes, planes, dilation=dilation)
        self.bn2 = nn.BatchNorm2d(planes)
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        residual = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)

        if self.downsample is not None:
            residual = self.downsample(x)

        out += residual
        out = self.relu(out)

        return out

class Bottleneck(nn.Module):
    expansion = 4

    def __init__(self, inplanes, planes, stride=1,
                 downsample=None, dilation=1):
        super(Bottleneck, self).__init__()
        self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False)
        self.bn1 = nn.BatchNorm2d(planes)
        padding = 2 - stride
        if downsample is not None and dilation > 1:
            dilation = dilation // 2
            padding = dilation

        assert stride == 1 or dilation == 1, \
            "stride and dilation must have one equals to zero at least"

        if dilation > 1:
            padding = dilation
        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride,
                               padding=padding, bias=False, dilation=dilation)
        self.bn2 = nn.BatchNorm2d(planes)
        self.conv3 = nn.Conv2d(planes, planes * 4, kernel_size=1, bias=False)
        self.bn3 = nn.BatchNorm2d(planes * 4)
        self.relu = nn.ReLU(inplace=True)
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        residual = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)

        out = self.conv3(out)
        out = self.bn3(out)

        if self.downsample is not None:
            residual = self.downsample(x)

        out += residual

        out = self.relu(out)

        return out

# End of Building Blocks

# Class for ResNet - the Backbone neural network

class ResNet(nn.Module):
    def __init__(self, block, layers, used_layers):
        self.inplanes = 64
        super(ResNet, self).__init__()
        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=0,  # 3
                               bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.layer1 = self._make_layer(block, 64, layers[0])
        self.layer2 = self._make_layer(block, 128, layers[1], stride=2)

        self.feature_size = 128 * block.expansion
        self.used_layers = used_layers
        layer3 = True if 3 in used_layers else False
        layer4 = True if 4 in used_layers else False

        if layer3:
            self.layer3 = self._make_layer(block, 256, layers[2],
                                           stride=1, dilation=2)  # 15x15, 7x7
            self.feature_size = (256 + 128) * block.expansion
        else:
            self.layer3 = lambda x: x  # identity

        if layer4:
            self.layer4 = self._make_layer(block, 512, layers[3],
                                           stride=1, dilation=4)  # 7x7, 3x3
            self.feature_size = 512 * block.expansion
        else:
            self.layer4 = lambda x: x  # identity

        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
                m.weight.data.normal_(0, math.sqrt(2. / n))
            elif isinstance(m, nn.BatchNorm2d):
                m.weight.data.fill_(1)
                m.bias.data.zero_()

    def _make_layer(self, block, planes, blocks, stride=1, dilation=1):
        downsample = None
        dd = dilation
        if stride != 1 or self.inplanes != planes * block.expansion:
            if stride == 1 and dilation == 1:
                downsample = nn.Sequential(
                    nn.Conv2d(self.inplanes, planes * block.expansion,
                              kernel_size=1, stride=stride, bias=False),
                    nn.BatchNorm2d(planes * block.expansion),
                )
            else:
                if dilation > 1:
                    dd = dilation // 2
                    padding = dd
                else:
                    dd = 1
                    padding = 0
                downsample = nn.Sequential(
                    nn.Conv2d(self.inplanes, planes * block.expansion,
                              kernel_size=3, stride=stride, bias=False,
                              padding=padding, dilation=dd),
                    nn.BatchNorm2d(planes * block.expansion),
                )

        layers = []
        layers.append(block(self.inplanes, planes, stride,
                            downsample, dilation=dilation))
        self.inplanes = planes * block.expansion
        for i in range(1, blocks):
            layers.append(block(self.inplanes, planes, dilation=dilation))

        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x_ = self.relu(x)
        x = self.maxpool(x_)

        p1 = self.layer1(x)
        p2 = self.layer2(p1)
        p3 = self.layer3(p2)
        p4 = self.layer4(p3)
        out = [x_, p1, p2, p3, p4]
        out = [out[i] for i in self.used_layers]
        if len(out) == 1:
            return out[0]
        else:
            return out

# End of ResNet

# Class for Adjusting the layers of the neural net

class AdjustLayer(nn.Module):
    def __init__(self, in_channels, out_channels, center_size=7):
        super(AdjustLayer, self).__init__()
        self.downsample = nn.Sequential(
            nn.Conv2d(in_channels, out_channels, kernel_size=1, bias=False),
            nn.BatchNorm2d(out_channels),
            )
        self.center_size = center_size

    #def forward(self, x):
    #    x = self.downsample(x)
    #    if x.size(3) < 20:
    #        l = (x.size(3) - self.center_size) // 2
    #        print(l)
    #        r = l + self.center_size
    #        print(r)
    #        x = x[:, :, l:r, l:r]
    #    return x

    def forward(self, x):
        x = self.downsample(x)
        l = 3
        r = 10
        x = x[:, :, l:r, l:r]
        return x

class AdjustAllLayer(nn.Module):
    def __init__(self, in_channels, out_channels, center_size=7):
        super(AdjustAllLayer, self).__init__()
        self.num = len(out_channels)
        if self.num == 1:
            self.downsample = AdjustLayer(in_channels[0],
                                          out_channels[0],
                                          center_size)
        else:
            for i in range(self.num):
                self.add_module('downsample'+str(i+2),
                                AdjustLayer(in_channels[i],
                                            out_channels[i],
                                            center_size))

    def forward(self, features):
        if self.num == 1:
            return self.downsample(features)
        else:
            out = []
            for i in range(self.num):
                adj_layer = getattr(self, 'downsample'+str(i+2))
                out.append(adj_layer(features[i]))
            return out        

# End of Class for Adjusting the layers of the neural net

# Class for Region Proposal Neural Network

class RPN(nn.Module):
    def __init__(self):
        super(RPN, self).__init__()

    def forward(self, z_f, x_f):
        raise NotImplementedError

class DepthwiseXCorr(nn.Module):
    def __init__(self, in_channels, hidden, out_channels, kernel_size=3, hidden_kernel_size=5):
        super(DepthwiseXCorr, self).__init__()
        self.conv_kernel = nn.Sequential(
                nn.Conv2d(in_channels, hidden, kernel_size=kernel_size, bias=False),
                nn.BatchNorm2d(hidden),
                nn.ReLU(inplace=True),
                )
        self.conv_search = nn.Sequential(
                nn.Conv2d(in_channels, hidden, kernel_size=kernel_size, bias=False),
                nn.BatchNorm2d(hidden),
                nn.ReLU(inplace=True),
                )
        self.head = nn.Sequential(
                nn.Conv2d(hidden, hidden, kernel_size=1, bias=False),
                nn.BatchNorm2d(hidden),
                nn.ReLU(inplace=True),
                nn.Conv2d(hidden, out_channels, kernel_size=1)
                )

    def forward(self, kernel, search):
        kernel = self.conv_kernel(kernel)
        search = self.conv_search(search)
        feature = xcorr_depthwise(search, kernel)
        out = self.head(feature)
        return out

class DepthwiseRPN(RPN):
    def __init__(self, anchor_num=5, in_channels=256, out_channels=256):
        super(DepthwiseRPN, self).__init__()
        self.cls = DepthwiseXCorr(in_channels, out_channels, 2 * anchor_num)
        self.loc = DepthwiseXCorr(in_channels, out_channels, 4 * anchor_num)

    def forward(self, z_f, x_f):
        cls = self.cls(z_f, x_f)
        loc = self.loc(z_f, x_f)
        return cls, loc

class MultiRPN(RPN):
    def __init__(self, anchor_num, in_channels, weighted=False):
        super(MultiRPN, self).__init__()
        self.weighted = weighted
        for i in range(len(in_channels)):
            self.add_module('rpn'+str(i+2),
                    DepthwiseRPN(anchor_num, in_channels[i], in_channels[i]))
        if self.weighted:
            self.cls_weight = nn.Parameter(torch.ones(len(in_channels)))
            self.loc_weight = nn.Parameter(torch.ones(len(in_channels)))

    def forward(self, z_fs, x_fs):
        cls = []
        loc = []
        for idx, (z_f, x_f) in enumerate(zip(z_fs, x_fs), start=2):
            rpn = getattr(self, 'rpn'+str(idx))
            c, l = rpn(z_f, x_f)
            cls.append(c)
            loc.append(l)

        if self.weighted:
            cls_weight = F.softmax(self.cls_weight, 0)
            loc_weight = F.softmax(self.loc_weight, 0)

        def avg(lst):
            return sum(lst) / len(lst)

        def weighted_avg(lst, weight):
            s = 0
            for i in range(len(weight)):
                s += lst[i] * weight[i]
            return s

        #def weighted_avg(lst, weight):
        #    s = weighted_avg_loop(lst, weight)
        #    return s

        if self.weighted:
            return weighted_avg(cls, cls_weight), weighted_avg(loc, loc_weight)
        else:
            return avg(cls), avg(loc)

# End of class for RPN

def conv3x3(in_planes, out_planes, stride=1, dilation=1):
    "3x3 convolution with padding"
    return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,
                     padding=dilation, bias=False, dilation=dilation)

def xcorr_depthwise(x, kernel):
    """depthwise cross correlation
    """
    batch = kernel.size(0)
    channel = kernel.size(1)
    x = x.view(1, batch*channel, x.size(2), x.size(3))
    kernel = kernel.view(batch*channel, 1, kernel.size(2), kernel.size(3))
    out = F.conv2d(x, kernel, groups=batch*channel)
    #out = torch_conv2d(x, kernel, batch, channel)
    #out = conv2(x, kernel, groups=batch*channel)
    out = out.view(batch, channel, out.size(2), out.size(3))
    return out

def resnet18(used_layers):
    """Constructs a ResNet-18 model.
    """
    model = ResNet(BasicBlock, [2, 2, 2, 2], used_layers)
    return model

def resnet50(used_layers):
    """Constructs a ResNet-50 model.
    """
    model = ResNet(Bottleneck, [3, 4, 6, 3], used_layers)
    return model

####################################################################################

# Build the torch backbone model
backbone = resnet50([2,3,4])
backbone.eval()
backbone.state_dict().keys()
backbone_dict = backbone.state_dict()

# Pre-trained Weights to the backbone model
current_path = os.getcwd()
load_path = os.path.join(current_path, "siamrpn_r50_l234_dwxcorr.pth")
pretrained_dict = torch.load(load_path,map_location=torch.device('cpu') )
#pretrained_dict = torch.load(load_path)
pretrained_dict.keys()

pretrained_dict = {k: v for k, v in pretrained_dict.items() if k in backbone_dict}
backbone_dict.update(pretrained_dict)
backbone.load_state_dict(backbone_dict)

# Dummmy Inputs for the torch backbone model
target = Variable(torch.ones(1, 3, 126, 126), requires_grad=False)
search = Variable(torch.ones(1, 3, 224, 224), requires_grad=False)

# Export the torch backbone model to ONNX model (one for target, one for search)
batch_size = 1
torch.onnx.export(backbone, target, "resnet_target.onnx", export_params=True, opset_version=10,
                  do_constant_folding=True, input_names = ['input'], output_names = ['output'])
                  #dynamic_axes={'input' : {0 : 'batch_size'}, 'output' : {0 : 'batch_size'}}

batch_size = 1
torch.onnx.export(backbone, search, "resnet_search.onnx", export_params=True, opset_version=10,
                  do_constant_folding=True, input_names = ['input'], output_names = ['output'])
                  #dynamic_axes={'input' : {0 : 'batch_size'}, 'output' : {0 : 'batch_size'}}

# Load the saved backbone model (target) using ONNX
onnx_resnet_target = onnx.load("resnet_target.onnx")

# Check whether the backbone model has been successfully imported
onnx.checker.check_model(onnx_resnet_target)
print(onnx.checker.check_model(onnx_resnet_target))
onnx.helper.printable_graph(onnx_resnet_target.graph)
print(onnx.helper.printable_graph(onnx_resnet_target.graph))

# Load the saved tracker model (search) using ONNX
onnx_resnet_search = onnx.load("resnet_search.onnx")

# Check whether the tracker model has been successfully imported
onnx.checker.check_model(onnx_resnet_search)
print(onnx.checker.check_model(onnx_resnet_search))
onnx.helper.printable_graph(onnx_resnet_search.graph)
print(onnx.helper.printable_graph(onnx_resnet_search.graph))

# Check the outputs of each of the backbone model
torch_resnet_target_output = backbone(target)
print(torch_resnet_target_output[0].shape) # ===  torch.Size([1, 512, 14, 14]) 
print(torch_resnet_target_output[1].shape) # ===  torch.Size([1, 1024, 14, 14])
print(torch_resnet_target_output[2].shape) # ===  torch.Size([1, 2048, 14, 14])

torch_resnet_target_output_1 = torch_resnet_target_output[0].detach().numpy()
torch_resnet_target_output_2 = torch_resnet_target_output[1].detach().numpy()
torch_resnet_target_output_3 = torch_resnet_target_output[2].detach().numpy()

torch_resnet_search_output = backbone(search) 
print(torch_resnet_search_output[0].shape) # ===  torch.Size([1, 512, 27, 27])
print(torch_resnet_search_output[1].shape) # ===  torch.Size([1, 1024, 27, 27])
print(torch_resnet_search_output[2].shape) # ===  torch.Size([1, 2048, 27, 27])  

torch_resnet_search_output_1 = torch_resnet_search_output[0].detach().numpy()
torch_resnet_search_output_2 = torch_resnet_search_output[1].detach().numpy()
torch_resnet_search_output_3 = torch_resnet_search_output[2].detach().numpy()

# Check whether the models are successfully imported using OpenCV Library (readNetFromONNX)
#inp = np.random.standard_normal([1, 3, 126, 126]).astype(np.float)
inp = np.ones([1, 3, 126, 126]).astype(np.float)
cv_resnet_target = cv2.dnn.readNetFromONNX('resnet_target.onnx')
cv_resnet_target.setInput(inp)

cv_resnet_target_output_1 = cv_resnet_target.forward('458')
cv_resnet_target_output_2 = cv_resnet_target.forward('490')
cv_resnet_target_output_3 = cv_resnet_target.forward('output')

print(cv_resnet_target_output_1.shape) # === ([1, 1024, 14, 14])
print(cv_resnet_target_output_2.shape) # === ([1, 2048, 14, 14])
print(cv_resnet_target_output_3.shape) # === ([1, 512, 14, 14])

#inp = np.random.standard_normal([1, 3, 224, 224]).astype(np.float)
inp = np.ones([1, 3, 224, 224]).astype(np.float)
cv_resnet_search = cv2.dnn.readNetFromONNX('resnet_search.onnx')
cv_resnet_search.setInput(inp)

cv_resnet_search_output_1 = cv_resnet_search.forward('458')
cv_resnet_search_output_2 = cv_resnet_search.forward('490')
cv_resnet_search_output_3 = cv_resnet_search.forward('output')

print(cv_resnet_search_output_1.shape) # === ([1, 1024, 27, 27])
print(cv_resnet_search_output_2.shape) # === ([1, 2048, 27, 27])
print(cv_resnet_search_output_3.shape) # === ([1, 512, 27, 27])

#############################################################################################

# Build neck_1_model
# neck = AdjustAllLayer([512, 1024, 2048], [256, 256, 256])
neck_1 = AdjustAllLayer([512], [256])

neck_1.eval()
neck_1.state_dict().keys()
neck_1_dict = neck_1.state_dict()

current_path = os.getcwd()
load_path = os.path.join(current_path, "siamrpn_r50_l234_dwxcorr.pth")
pretrained_dict = torch.load(load_path,map_location=torch.device('cpu') )
#pretrained_dict = torch.load(load_path)
pretrained_dict.keys()

pretrained_dict_1 = pretrained_dict

pretrained_dict_1 = {k: v for k, v in pretrained_dict_1.items() if k in neck_1_dict}
neck_1_dict.update(pretrained_dict_1)
neck_1.load_state_dict(neck_1_dict)

# Export the torch neck_1 model to ONNX model
batch_size = 1
torch.onnx.export(neck_1, torch_resnet_target_output[0], "neck_1_out_1.onnx", export_params=True, opset_version=10,
                  do_constant_folding=True, input_names = ['input'], output_names = ['output'])
                  #dynamic_axes={'input' : {0 : 'batch_size'}, 'output' : {0 : 'batch_size'}}

batch_size = 1
torch.onnx.export(neck_1, torch_resnet_search_output[0], "neck_1_out_2.onnx", export_params=True, opset_version=10,
                  do_constant_folding=True, input_names = ['input'], output_names = ['output'])
                  #dynamic_axes={'input' : {0 : 'batch_size'}, 'output' : {0 : 'batch_size'}}

# Load the saved neck_1 model using ONNX
onnx_neck_1_out_1_model = onnx.load("neck_1_out_1.onnx")

# Check whether the neck_1 model has been successfully imported
onnx.checker.check_model(onnx_neck_1_out_1_model)
print(onnx.checker.check_model(onnx_neck_1_out_1_model))
onnx.helper.printable_graph(onnx_neck_1_out_1_model.graph)
print(onnx.helper.printable_graph(onnx_neck_1_out_1_model.graph))

# Load the saved neck_1 model using ONNX
onnx_neck_1_out_2_model = onnx.load("neck_1_out_2.onnx")

# Check whether the tracker model has been successfully imported
onnx.checker.check_model(onnx_neck_1_out_2_model)
print(onnx.checker.check_model(onnx_neck_1_out_2_model))
onnx.helper.printable_graph(onnx_neck_1_out_2_model.graph)
print(onnx.helper.printable_graph(onnx_neck_1_out_2_model.graph))

# Build neck_2_model
neck_2 = AdjustAllLayer([1024], [256])

neck_2.eval()
neck_2.state_dict().keys()
neck_2_dict = neck_2.state_dict()

pretrained_dict_2 = pretrained_dict
pretrained_dict_2 = {k: v for k, v in pretrained_dict_2.items() if k in neck_2_dict}
neck_2_dict.update(pretrained_dict_2)
neck_2.load_state_dict(neck_2_dict)

# Export the torch neck_2 model to ONNX model
batch_size = 1
torch.onnx.export(neck_2, torch_resnet_target_output[1], "neck_2_out_1.onnx", export_params=True, opset_version=10,
                  do_constant_folding=True, input_names = ['input'], output_names = ['output'])
                  #dynamic_axes={'input' : {0 : 'batch_size'}, 'output' : {0 : 'batch_size'}}

batch_size = 1
torch.onnx.export(neck_2, torch_resnet_search_output[1], "neck_2_out_2.onnx", export_params=True, opset_version=10,
                  do_constant_folding=True, input_names = ['input'], output_names = ['output'])
                  #dynamic_axes={'input' : {0 : 'batch_size'}, 'output' : {0 : 'batch_size'}}

# Load the saved neck_2 model using ONNX
onnx_neck_2_out_1_model = onnx.load("neck_2_out_1.onnx")

# Check whether the neck_2 model has been successfully imported
onnx.checker.check_model(onnx_neck_2_out_1_model)
print(onnx.checker.check_model(onnx_neck_2_out_1_model))
onnx.helper.printable_graph(onnx_neck_2_out_1_model.graph)
print(onnx.helper.printable_graph(onnx_neck_2_out_1_model.graph))

# Load the saved neck_2 model using ONNX
onnx_neck_2_out_2_model = onnx.load("neck_2_out_2.onnx")

# Check whether the neck_2 model has been successfully imported
onnx.checker.check_model(onnx_neck_2_out_2_model)
print(onnx.checker.check_model(onnx_neck_2_out_2_model))
onnx.helper.printable_graph(onnx_neck_2_out_2_model.graph)
print(onnx.helper.printable_graph(onnx_neck_2_out_2_model.graph))

# Build neck_3_model
neck_3 = AdjustAllLayer([2048], [256])

neck_3.eval()
neck_3.state_dict().keys()
neck_3_dict = neck_3.state_dict()

pretrained_dict_3 = pretrained_dict

pretrained_dict_3 = {k: v for k, v in pretrained_dict_3.items() if k in neck_3_dict}
neck_3_dict.update(pretrained_dict_3)
neck_3.load_state_dict(neck_3_dict)

# Export the torch neck_3 model to ONNX model
batch_size = 1
torch.onnx.export(neck_3, torch_resnet_target_output[2], "neck_3_out_1.onnx", export_params=True, opset_version=11,
                  do_constant_folding=True, input_names = ['input'], output_names = ['output'])
                  #dynamic_axes={'input' : {0 : 'batch_size'}, 'output' : {0 : 'batch_size'}}

batch_size = 1
torch.onnx.export(neck_3, torch_resnet_search_output[2], "neck_3_out_2.onnx", export_params=True, opset_version=11,
                  do_constant_folding=True, input_names = ['input'], output_names = ['output'])
                  #dynamic_axes={'input' : {0 : 'batch_size'}, 'output' : {0 : 'batch_size'}}

# Load the saved neck_3 model using ONNX
onnx_neck_3_out_1_model = onnx.load("neck_3_out_1.onnx")

# Check whether the tracker model has been successfully imported
onnx.checker.check_model(onnx_neck_3_out_1_model)
print(onnx.checker.check_model(onnx_neck_3_out_1_model))
onnx.helper.printable_graph(onnx_neck_3_out_1_model.graph)
print(onnx.helper.printable_graph(onnx_neck_3_out_1_model.graph))

# Load the saved neck_3 model using ONNX
onnx_neck_3_out_2_model = onnx.load("neck_3_out_2.onnx")

# Check whether the tracker model has been successfully imported
onnx.checker.check_model(onnx_neck_3_out_2_model)
print(onnx.checker.check_model(onnx_neck_3_out_2_model))
onnx.helper.printable_graph(onnx_neck_3_out_2_model.graph)
print(onnx.helper.printable_graph(onnx_neck_3_out_2_model.graph))

#############################################################################################

bhack commented 3 years ago

@jinyup100 What is the current issue with RPN? Is It something specific of the Siam model or a more generic one like It was already handled in https://github.com/pytorch/vision/pull/1329?

jinyup100 commented 3 years ago

@jinyup100 What is the current issue with RPN? Is It something specific of the Siam model or a more generic one like It was already handled in pytorch/vision#1329?

@bhack The issue with the RPN was that there was an error regarding pytorch's Functional Conv2D when exporting from PyTorch model to ONNX. Currently, I believe this is so because the converter does not support PyTorch's Functional Conv2D layer. So at the moment, I am trying to replace the F.Conv2d with nn.conv2d.

bhack commented 3 years ago

Ok double check the ticket I've mentioned cause If the Siam RPN Is like the standard one you could need tricks from that PR

jinyup100 commented 3 years ago

As an update from last week's comment, I have updated pytorch's Functional Conv2d (nn.F.conv2d) with the conventional conv2d (nn.Conv2d) in the codes below. There is, however, a new error stating _"onnx_importer.cpp:262: error: (-204:Requested object was not found) Blob 915 not found in const blobs in function 'cv::dnn::dnn4v20200609::ONNXImporter::getBlob'". I would greatly appreciate any advice on the stated error. Thank you.


import cv2
import math
import numpy as np
import os
import onnx
import torch
import torch.nn as nn
import torch.nn.functional as F

from torch.autograd import Variable

#@torch.jit.script
#def slice_helper(x,l_offset : int, u_offset : int):
#    return x[:,:,l_offset:u_offset,l_offset:u_offset]

@torch.jit.script
def torch_conv2d(x, kernel, batch : int, channel : int):
    groups = batch*channel
    out = F.conv2d(input=x, weight=kernel, groups=batch*channel)
    return out

# Class for the Tracker - SiamRPNPP

class Tracker(nn.Module):
    def __init__(self):
        super(Tracker, self).__init__()

        # build backbone
        self.backbone = resnet50([2,3,4])

        # build adjust layer
        self.neck = AdjustAllLayer([512, 1024, 2048], [256, 256, 256])

        # build rpn head
        self.rpn_head = MultiRPN(anchor_num=5,in_channels=[256, 256, 256],weighted=True)

    def template(self, z):
        zf = self.backbone(z)
        zf = self.neck(zf)
        self.zf = zf

    def track(self, x):
        xf = self.backbone(x)
        xf = self.neck(xf)
        cls, loc = self.rpn_head(self.zf, xf)
        return {'cls': cls,'loc': loc,}

    def log_softmax(self, cls):
        b, a2, h, w = cls.size()
        cls = cls.view(b, 2, a2//2, h, w)
        cls = cls.permute(0, 2, 3, 4, 1).contiguous()
        cls = F.log_softmax(cls, dim=4)
        return cls

    def forward(self, data):
        """ only used in training
        """
        template = data[0]
        search = data[1]
        #label_cls = data['label_cls']
        #label_loc = data['label_loc']
        #label_loc_weight = data['label_loc_weight']

        # get feature
        zf = self.backbone(template)
        xf = self.backbone(search)

        zf = self.neck(zf)
        xf = self.neck(xf)
        cls, loc = self.rpn_head(zf, xf)

        # get loss
        #cls = self.log_softmax(cls)
        #cls_loss = select_cross_entropy_loss(cls, label_cls)
        #loc_loss = weight_l1_loss(loc, label_loc, label_loc_weight)

        #outputs = {}
        #outputs['total_loss'] = cfg.TRAIN.CLS_WEIGHT * cls_loss + \
            #cfg.TRAIN.LOC_WEIGHT * loc_loss
        #outputs['cls_loss'] = cls_loss
        #outputs['loc_loss'] = loc_loss

        return cls, loc

# End of Tracker - SiamRPNPP

# Class for the Building Blocks required for ResNet

class BasicBlock(nn.Module):
    expansion = 1

    def __init__(self, inplanes, planes, stride=1,downsample=None, dilation=1):
        super(BasicBlock, self).__init__()
        padding = 2 - stride

        if dilation > 1:
            padding = dilation

        dd = dilation
        pad = padding
        if downsample is not None and dilation > 1:
            dd = dilation // 2
            pad = dd

        self.conv1 = nn.Conv2d(inplanes, planes,
                               stride=stride, dilation=dd, bias=False,
                               kernel_size=3, padding=pad)
        self.bn1 = nn.BatchNorm2d(planes)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = conv3x3(planes, planes, dilation=dilation)
        self.bn2 = nn.BatchNorm2d(planes)
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        residual = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)

        if self.downsample is not None:
            residual = self.downsample(x)

        out += residual
        out = self.relu(out)

        return out

class Bottleneck(nn.Module):
    expansion = 4

    def __init__(self, inplanes, planes, stride=1,
                 downsample=None, dilation=1):
        super(Bottleneck, self).__init__()
        self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False)
        self.bn1 = nn.BatchNorm2d(planes)
        padding = 2 - stride
        if downsample is not None and dilation > 1:
            dilation = dilation // 2
            padding = dilation

        assert stride == 1 or dilation == 1, \
            "stride and dilation must have one equals to zero at least"

        if dilation > 1:
            padding = dilation
        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride,
                               padding=padding, bias=False, dilation=dilation)
        self.bn2 = nn.BatchNorm2d(planes)
        self.conv3 = nn.Conv2d(planes, planes * 4, kernel_size=1, bias=False)
        self.bn3 = nn.BatchNorm2d(planes * 4)
        self.relu = nn.ReLU(inplace=True)
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        residual = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)

        out = self.conv3(out)
        out = self.bn3(out)

        if self.downsample is not None:
            residual = self.downsample(x)

        out += residual

        out = self.relu(out)

        return out

# End of Building Blocks

# Class for ResNet - the Backbone neural network

class ResNet(nn.Module):
    def __init__(self, block, layers, used_layers):
        self.inplanes = 64
        super(ResNet, self).__init__()
        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=0,  # 3
                               bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.layer1 = self._make_layer(block, 64, layers[0])
        self.layer2 = self._make_layer(block, 128, layers[1], stride=2)

        self.feature_size = 128 * block.expansion
        self.used_layers = used_layers
        layer3 = True if 3 in used_layers else False
        layer4 = True if 4 in used_layers else False

        if layer3:
            self.layer3 = self._make_layer(block, 256, layers[2],
                                           stride=1, dilation=2)  # 15x15, 7x7
            self.feature_size = (256 + 128) * block.expansion
        else:
            self.layer3 = lambda x: x  # identity

        if layer4:
            self.layer4 = self._make_layer(block, 512, layers[3],
                                           stride=1, dilation=4)  # 7x7, 3x3
            self.feature_size = 512 * block.expansion
        else:
            self.layer4 = lambda x: x  # identity

        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
                m.weight.data.normal_(0, math.sqrt(2. / n))
            elif isinstance(m, nn.BatchNorm2d):
                m.weight.data.fill_(1)
                m.bias.data.zero_()

    def _make_layer(self, block, planes, blocks, stride=1, dilation=1):
        downsample = None
        dd = dilation
        if stride != 1 or self.inplanes != planes * block.expansion:
            if stride == 1 and dilation == 1:
                downsample = nn.Sequential(
                    nn.Conv2d(self.inplanes, planes * block.expansion,
                              kernel_size=1, stride=stride, bias=False),
                    nn.BatchNorm2d(planes * block.expansion),
                )
            else:
                if dilation > 1:
                    dd = dilation // 2
                    padding = dd
                else:
                    dd = 1
                    padding = 0
                downsample = nn.Sequential(
                    nn.Conv2d(self.inplanes, planes * block.expansion,
                              kernel_size=3, stride=stride, bias=False,
                              padding=padding, dilation=dd),
                    nn.BatchNorm2d(planes * block.expansion),
                )

        layers = []
        layers.append(block(self.inplanes, planes, stride,
                            downsample, dilation=dilation))
        self.inplanes = planes * block.expansion
        for i in range(1, blocks):
            layers.append(block(self.inplanes, planes, dilation=dilation))

        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x_ = self.relu(x)
        x = self.maxpool(x_)

        p1 = self.layer1(x)
        p2 = self.layer2(p1)
        p3 = self.layer3(p2)
        p4 = self.layer4(p3)
        out = [x_, p1, p2, p3, p4]
        out = [out[i] for i in self.used_layers]
        if len(out) == 1:
            return out[0]
        else:
            return out

# End of ResNet

# Class for Adjusting the layers of the neural net

class AdjustLayer(nn.Module):
    def __init__(self, in_channels, out_channels, center_size=7):
        super(AdjustLayer, self).__init__()
        self.downsample = nn.Sequential(
            nn.Conv2d(in_channels, out_channels, kernel_size=1, bias=False),
            nn.BatchNorm2d(out_channels),
            )
        self.center_size = center_size

    #def forward(self, x):
    #    x = self.downsample(x)
    #    if x.size(3) < 20:
    #        l = (x.size(3) - self.center_size) // 2
    #        print(l)
    #        r = l + self.center_size
    #        print(r)
    #        x = x[:, :, l:r, l:r]
    #    return x

    def forward(self, x):
        x = self.downsample(x)
        l = 3
        r = 10
        x = x[:, :, l:r, l:r]
        return x

class AdjustAllLayer(nn.Module):
    def __init__(self, in_channels, out_channels, center_size=7):
        super(AdjustAllLayer, self).__init__()
        self.num = len(out_channels)
        if self.num == 1:
            self.downsample = AdjustLayer(in_channels[0],
                                          out_channels[0],
                                          center_size)
        else:
            for i in range(self.num):
                self.add_module('downsample'+str(i+2),
                                AdjustLayer(in_channels[i],
                                            out_channels[i],
                                            center_size))

    def forward(self, features):
        if self.num == 1:
            return self.downsample(features)
        else:
            out = []
            for i in range(self.num):
                adj_layer = getattr(self, 'downsample'+str(i+2))
                out.append(adj_layer(features[i]))
            return out        

# End of Class for Adjusting the layers of the neural net

# Class for Region Proposal Neural Network

class RPN(nn.Module):
    def __init__(self):
        super(RPN, self).__init__()

    def forward(self, z_f, x_f):
        raise NotImplementedError

class DepthwiseXCorr(nn.Module):
    def __init__(self, in_channels, hidden, out_channels, kernel_size=3, hidden_kernel_size=5):
        super(DepthwiseXCorr, self).__init__()
        self.conv_kernel = nn.Sequential(
                nn.Conv2d(in_channels, hidden, kernel_size=kernel_size, bias=False),
                nn.BatchNorm2d(hidden),
                nn.ReLU(inplace=True),
                )
        self.conv_search = nn.Sequential(
                nn.Conv2d(in_channels, hidden, kernel_size=kernel_size, bias=False),
                nn.BatchNorm2d(hidden),
                nn.ReLU(inplace=True),
                )
        self.head = nn.Sequential(
                nn.Conv2d(hidden, hidden, kernel_size=1, bias=False),
                nn.BatchNorm2d(hidden),
                nn.ReLU(inplace=True),
                nn.Conv2d(hidden, out_channels, kernel_size=1)
                )

    def forward(self, kernel, search):
        kernel = self.conv_kernel(kernel)
        search = self.conv_search(search)
        feature = xcorr_depthwise(search, kernel)
        out = self.head(feature)
        return out

class DepthwiseRPN(RPN):
    def __init__(self, anchor_num=5, in_channels=256, out_channels=256):
        super(DepthwiseRPN, self).__init__()
        self.cls = DepthwiseXCorr(in_channels, out_channels, 2 * anchor_num)
        self.loc = DepthwiseXCorr(in_channels, out_channels, 4 * anchor_num)

    def forward(self, z_f, x_f):
        cls = self.cls(z_f, x_f)
        loc = self.loc(z_f, x_f)
        return cls, loc

class MultiRPN(RPN):
    def __init__(self, anchor_num, in_channels, weighted=False):
        super(MultiRPN, self).__init__()
        self.weighted = weighted
        for i in range(len(in_channels)):
            self.add_module('rpn'+str(i+2),
                    DepthwiseRPN(anchor_num, in_channels[i], in_channels[i]))
        if self.weighted:
            self.cls_weight = nn.Parameter(torch.ones(len(in_channels)))
            self.loc_weight = nn.Parameter(torch.ones(len(in_channels)))

    def forward(self, z_fs, x_fs):
        cls = []
        loc = []
        for idx, (z_f, x_f) in enumerate(zip(z_fs, x_fs), start=2):
            rpn = getattr(self, 'rpn'+str(idx))
            c, l = rpn(z_f, x_f)
            cls.append(c)
            loc.append(l)

        if self.weighted:
            cls_weight = F.softmax(self.cls_weight, 0)
            loc_weight = F.softmax(self.loc_weight, 0)

        def avg(lst):
            return sum(lst) / len(lst)

        def weighted_avg(lst, weight):
            s = 0
            fixed_weight_len = 3
            for i in range(fixed_weight_len):
                s += lst[i] * weight[i]
            return s

        #def weighted_avg(lst, weight):
        #    s = weighted_avg_loop(lst, weight)
        #    return s

        if self.weighted:
            return weighted_avg(cls, cls_weight), weighted_avg(loc, loc_weight)
        else:
            return avg(cls), avg(loc)

# End of class for RPN

def conv3x3(in_planes, out_planes, stride=1, dilation=1):
    "3x3 convolution with padding"
    return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,
                     padding=dilation, bias=False, dilation=dilation)

#def xcorr_depthwise(x, kernel):
#    """depthwise cross correlation
#    """
#    batch = kernel.size(0)
#    channel = kernel.size(1)
#    x = x.view(1, batch*channel, x.size(2), x.size(3))
#    kernel = kernel.view(batch*channel, 1, kernel.size(2), kernel.size(3))
#    out = F.conv2d(x, kernel, groups=batch*channel)
#    #out = torch_conv2d(x, kernel, batch, channel)
#    #out = conv2(x, kernel, groups=batch*channel)
#    out = out.view(batch, channel, out.size(2), out.size(3))
#    return out

def xcorr_depthwise(x, kernel):
    """ depthwise cross correlation
    """
    batch = kernel.size(0)
    channel = kernel.size(1)
    x = x.view(1, batch*channel, x.size(2), x.size(3))
    kernel = kernel.view(batch*channel, 1, kernel.size(2), kernel.size(3))
    conv = nn.Conv2d(batch*channel, batch*channel, kernel_size=(kernel.size(-1), kernel.size(-2)), bias=False, groups=batch*channel)
    conv.weight = nn.Parameter(kernel)
    out = conv(x) 
    out = out.view(batch, channel, out.size(2), out.size(3))
    out = out.detach()
    return out

def resnet18(used_layers):
    """Constructs a ResNet-18 model.
    """
    model = ResNet(BasicBlock, [2, 2, 2, 2], used_layers)
    return model

def resnet50(used_layers):
    """Constructs a ResNet-50 model.
    """
    model = ResNet(Bottleneck, [3, 4, 6, 3], used_layers)
    return model

# Build the torch backbone model
backbone = resnet50([2,3,4])
backbone.eval()
backbone.state_dict().keys()
backbone_dict = backbone.state_dict()

# Pre-trained Weights to the backbone model
current_path = os.getcwd()
load_path = os.path.join(current_path, "siamrpn_r50_l234_dwxcorr.pth")
pretrained_dict = torch.load(load_path,map_location=torch.device('cpu') )
#pretrained_dict = torch.load(load_path)
pretrained_dict.keys()

pretrained_dict = {k: v for k, v in pretrained_dict.items() if k in backbone_dict}
backbone_dict.update(pretrained_dict)
backbone.load_state_dict(backbone_dict)

# Dummmy Inputs for the torch backbone model
target = Variable(torch.ones(1, 3, 126, 126), requires_grad=False)
search = Variable(torch.ones(1, 3, 224, 224), requires_grad=False)

# Export the torch backbone model to ONNX model (one for target, one for search)
batch_size = 1
torch.onnx.export(backbone, target, "resnet_target.onnx", export_params=True, opset_version=10,
                  do_constant_folding=True, input_names = ['input'], output_names = ['output'])
                  #dynamic_axes={'input' : {0 : 'batch_size'}, 'output' : {0 : 'batch_size'}}

batch_size = 1
torch.onnx.export(backbone, search, "resnet_search.onnx", export_params=True, opset_version=10,
                  do_constant_folding=True, input_names = ['input'], output_names = ['output'])
                  #dynamic_axes={'input' : {0 : 'batch_size'}, 'output' : {0 : 'batch_size'}}

# Load the saved backbone(target) model using ONNX
onnx_resnet_target = onnx.load("resnet_target.onnx")

# Check whether the backbone(target) model has been successfully imported
onnx.checker.check_model(onnx_resnet_target)
print(onnx.checker.check_model(onnx_resnet_target))
onnx.helper.printable_graph(onnx_resnet_target.graph)
print(onnx.helper.printable_graph(onnx_resnet_target.graph))

# Load the saved backbone(search) model using ONNX
onnx_resnet_search = onnx.load("resnet_search.onnx")

# Check whether the backbone(search) model has been successfully imported
onnx.checker.check_model(onnx_resnet_search)
print(onnx.checker.check_model(onnx_resnet_search))
onnx.helper.printable_graph(onnx_resnet_search.graph)
print(onnx.helper.printable_graph(onnx_resnet_search.graph))

# Check the outputs of each of the backbone model
torch_resnet_target_output = backbone(target)
torch_resnet_target_output_1 = torch_resnet_target_output[0].detach().numpy()
torch_resnet_target_output_2 = torch_resnet_target_output[1].detach().numpy()
torch_resnet_target_output_3 = torch_resnet_target_output[2].detach().numpy()

torch_resnet_search_output = backbone(search)
torch_resnet_search_output_1 = torch_resnet_search_output[0].detach().numpy()
torch_resnet_search_output_2 = torch_resnet_search_output[1].detach().numpy()
torch_resnet_search_output_3 = torch_resnet_search_output[2].detach().numpy()

# Build the torch neck1 model
# neck = AdjustAllLayer([512, 1024, 2048], [256, 256, 256])
neck_1 = AdjustAllLayer([512], [256])

neck_1.eval()
neck_1.state_dict().keys()
neck_1_dict = neck_1.state_dict()

current_path = os.getcwd()
load_path = os.path.join(current_path, "siamrpn_r50_l234_dwxcorr.pth")
pretrained_dict = torch.load(load_path,map_location=torch.device('cpu') )
#pretrained_dict = torch.load(load_path)
pretrained_dict.keys()

pretrained_dict_1 = pretrained_dict

pretrained_dict_1 = {k: v for k, v in pretrained_dict_1.items() if k in neck_1_dict}
neck_1_dict.update(pretrained_dict_1)
neck_1.load_state_dict(neck_1_dict)

# Export the torch tracker model to ONNX model
batch_size = 1
torch.onnx.export(neck_1, torch_resnet_target_output[0], "neck_1_out_1.onnx", export_params=True, opset_version=10,
                  do_constant_folding=True, input_names = ['input'], output_names = ['output'])
                  #dynamic_axes={'input' : {0 : 'batch_size'}, 'output' : {0 : 'batch_size'}}

batch_size = 1
torch.onnx.export(neck_1, torch_resnet_search_output[0], "neck_1_out_2.onnx", export_params=True, opset_version=10,
                  do_constant_folding=True, input_names = ['input'], output_names = ['output'])
                  #dynamic_axes={'input' : {0 : 'batch_size'}, 'output' : {0 : 'batch_size'}}

# Load the saved neck model using ONNX
onnx_neck_1_out_1_model = onnx.load("neck_1_out_1.onnx")

# Check whether the neck model has been successfully imported
onnx.checker.check_model(onnx_neck_1_out_1_model)
print(onnx.checker.check_model(onnx_neck_1_out_1_model))
onnx.helper.printable_graph(onnx_neck_1_out_1_model.graph)
print(onnx.helper.printable_graph(onnx_neck_1_out_1_model.graph))

# Load the saved neck model using ONNX
onnx_neck_1_out_2_model = onnx.load("neck_1_out_2.onnx")

# Check whether the neck model has been successfully imported
onnx.checker.check_model(onnx_neck_1_out_2_model)
print(onnx.checker.check_model(onnx_neck_1_out_2_model))
onnx.helper.printable_graph(onnx_neck_1_out_2_model.graph)
print(onnx.helper.printable_graph(onnx_neck_1_out_2_model.graph))

# Build the torch neck2 model
neck_2 = AdjustAllLayer([1024], [256])

neck_2.eval()
neck_2.state_dict().keys()
neck_2_dict = neck_2.state_dict()

pretrained_dict_2 = pretrained_dict
pretrained_dict_2 = {k: v for k, v in pretrained_dict_2.items() if k in neck_2_dict}
neck_2_dict.update(pretrained_dict_2)
neck_2.load_state_dict(neck_2_dict)

# Export the torch neck2 model to ONNX model
batch_size = 1
torch.onnx.export(neck_2, torch_resnet_target_output[1], "neck_2_out_1.onnx", export_params=True, opset_version=10,
                  do_constant_folding=True, input_names = ['input'], output_names = ['output'])
                  #dynamic_axes={'input' : {0 : 'batch_size'}, 'output' : {0 : 'batch_size'}}

batch_size = 1
torch.onnx.export(neck_2, torch_resnet_search_output[1], "neck_2_out_2.onnx", export_params=True, opset_version=10,
                  do_constant_folding=True, input_names = ['input'], output_names = ['output'])
                  #dynamic_axes={'input' : {0 : 'batch_size'}, 'output' : {0 : 'batch_size'}}

# Load the saved neck2 model using ONNX
onnx_neck_2_out_1_model = onnx.load("neck_2_out_1.onnx")

# Check whether the neck2 model has been successfully imported
onnx.checker.check_model(onnx_neck_2_out_1_model)
print(onnx.checker.check_model(onnx_neck_2_out_1_model))
onnx.helper.printable_graph(onnx_neck_2_out_1_model.graph)
print(onnx.helper.printable_graph(onnx_neck_2_out_1_model.graph))

# Load the saved neck2 model using ONNX
onnx_neck_2_out_2_model = onnx.load("neck_2_out_2.onnx")

# Check whether the tracker model has been successfully imported
onnx.checker.check_model(onnx_neck_2_out_2_model)
print(onnx.checker.check_model(onnx_neck_2_out_2_model))
onnx.helper.printable_graph(onnx_neck_2_out_2_model.graph)
print(onnx.helper.printable_graph(onnx_neck_2_out_2_model.graph))

# Build the neck 3 model
neck_3 = AdjustAllLayer([2048], [256])

neck_3.eval()
neck_3.state_dict().keys()
neck_3_dict = neck_3.state_dict()

pretrained_dict_3 = pretrained_dict

pretrained_dict_3 = {k: v for k, v in pretrained_dict_3.items() if k in neck_3_dict}
neck_3_dict.update(pretrained_dict_3)
neck_3.load_state_dict(neck_3_dict)

# Export the torch tracker model to ONNX model
batch_size = 1
torch.onnx.export(neck_3, torch_resnet_target_output[2], "neck_3_out_1.onnx", export_params=True, opset_version=11,
                  do_constant_folding=True, input_names = ['input'], output_names = ['output'])
                  #dynamic_axes={'input' : {0 : 'batch_size'}, 'output' : {0 : 'batch_size'}}

batch_size = 1
torch.onnx.export(neck_3, torch_resnet_search_output[2], "neck_3_out_2.onnx", export_params=True, opset_version=11,
                  do_constant_folding=True, input_names = ['input'], output_names = ['output'])
                  #dynamic_axes={'input' : {0 : 'batch_size'}, 'output' : {0 : 'batch_size'}}

# Load the saved tracker model using ONNX
onnx_neck_3_out_1_model = onnx.load("neck_3_out_1.onnx")

# Check whether the tracker model has been successfully imported
onnx.checker.check_model(onnx_neck_3_out_1_model)
print(onnx.checker.check_model(onnx_neck_3_out_1_model))
onnx.helper.printable_graph(onnx_neck_3_out_1_model.graph)
print(onnx.helper.printable_graph(onnx_neck_3_out_1_model.graph))

# Load the saved tracker model using ONNX
onnx_neck_3_out_2_model = onnx.load("neck_3_out_2.onnx")

# Check whether the tracker model has been successfully imported
onnx.checker.check_model(onnx_neck_3_out_2_model)
print(onnx.checker.check_model(onnx_neck_3_out_2_model))
onnx.helper.printable_graph(onnx_neck_3_out_2_model.graph)
print(onnx.helper.printable_graph(onnx_neck_3_out_2_model.graph))

# Check the outputs of the neck models
torch_neck_1_out_1 = neck_1(torch_resnet_target_output[0])
torch_neck_2_out_1 = neck_2(torch_resnet_target_output[1])
torch_neck_3_out_1 = neck_3(torch_resnet_target_output[2])

torch_neck_1_out_2 = neck_1(torch_resnet_search_output[0])
torch_neck_2_out_2 = neck_2(torch_resnet_search_output[1])
torch_neck_3_out_2 = neck_3(torch_resnet_search_output[2])

torch_neck_out_1 = [torch_neck_1_out_1, torch_neck_2_out_1, torch_neck_3_out_1]
torch_neck_out_1 = torch.stack(torch_neck_out_1)

torch_neck_out_2 = [torch_neck_1_out_2, torch_neck_2_out_2, torch_neck_3_out_2]
torch_neck_out_2 = torch.stack(torch_neck_out_2)

# Build the torch head model
rpn_head = MultiRPN(anchor_num=5,in_channels=[256, 256, 256],weighted=True)
rpn_head.eval()
rpn_head.state_dict().keys()
rpn_head_dict = rpn_head.state_dict()

# Pre-trained Weights to the backbone model
current_path = os.getcwd()
load_path = os.path.join(current_path, "siamrpn_r50_l234_dwxcorr.pth")
pretrained_dict = torch.load(load_path,map_location=torch.device('cpu') )
#pretrained_dict = torch.load(load_path)
pretrained_dict.keys()

pretrained_dict = {k: v for k, v in pretrained_dict.items() if k in rpn_head_dict}
rpn_head_dict.update(pretrained_dict)
rpn_head.load_state_dict(rpn_head_dict)

# Export the torch head model to ONNX model
batch_size = 1
torch.onnx.export(rpn_head, (torch_neck_out_1, torch_neck_out_2), "rpn_head.onnx", export_params=True, opset_version=10,
                  do_constant_folding=True, input_names = ['input'], output_names = ['output'])
                  #dynamic_axes={'input' : {0 : 'batch_size'}, 'output' : {0 : 'batch_size'}}

# Load the saved rpn_head model using ONNX
onnx_head_model = onnx.load("rpn_head.onnx")

# Check whether the rpn_head model has been successfully imported
onnx.checker.check_model(onnx_head_model)
print(onnx.checker.check_model(onnx_head_model))    
onnx.helper.printable_graph(onnx_head_model.graph)
print(onnx.helper.printable_graph(onnx_head_model.graph))

# Check the torch rpn_head_model
torch_head_out_1 = rpn_head(torch_neck_out_1, torch_neck_out_2)
print(torch_head_out_1[0].shape)
print(torch_head_out_1[1].shape)

# Where the error is currently occurring
cv_head_model = cv2.dnn.readNetFromONNX('rpn_head.onnx')

'''

bhack commented 3 years ago

@jinyup100 Where are defined (torch_neck_out_1, torch_neck_out_2) ?

jinyup100 commented 3 years ago

@jinyup100 Where are defined (torch_neck_out_1, torch_neck_out_2) ?

Apologies - I have updated the comment that includes the definition of torch_neck_out_1 and torch_neck_out_2.

My latest attempt has shown that there is an error with my implementation of RPN stating "RuntimeWarning: Iterating over a tensor might cause the trace to be incorrect. Passing a tensor of different shape won't change the number of iterations executed (and might lead to errors or silently give incorrect results). 'incorrect results).', category=RuntimeWarning)", which, I believe, seems to be an issue also raised at https://github.com/pytorch/pytorch/issues/19349

bhack commented 3 years ago

@jinyup100 Can you check in the code if you are in a colab? So that you can conditionally use !gdown --id <drive_file_id> to download Google driver assets and automate the .load?

bhack commented 3 years ago

Any update here?

jinyup100 commented 3 years ago

(07/14) I have updated the implementation of MultiRPN such that it no longer raises error stating "RuntimeWarning: Iterating over a tensor might cause the trace to be incorrect. Passing a tensor of different shape won't change the number of iterations executed..." I now believe that all the models have been successfully exported to ONNX format.

There is an error, however, when trying to import MultiRPN using _onnximporter (cv2.dnn.readNetFromONNX). The error states : _(-204:Requested object was not found) Blob z_fs not found in const blobs in function 'cv::dnn::dnn4_v20200609::ONNXImporter::getBlob' which seems to indicate that the ONNXImporter is not recognising multiple inputs (x_fs, zfs). I would greatly appreciate any advice on the stated error. Thank you.

Below are the successfully converted ONNX format of the models: ResNet50_target : Import :heavy_check_mark: Export :heavy_check_mark: https://drive.google.com/file/d/1syHYIVLh6fTnVAUfzS-hrrgq3GfFzAnI/view?usp=sharing ResNet50_search : Import :heavy_check_mark: Export :heavy_check_mark:https://drive.google.com/file/d/16YJt2chxzDjju8zCcWqkd-yVDHggEcuj/view?usp=sharing Adjusted_Layer_1_Output_1 : Import :heavy_check_mark: Export :heavy_check_mark: https://drive.google.com/file/d/1REnYzOTjUcFE04j-wuWjW3mzYuKXudxM/view?usp=sharing Adjusted_Layer_1_Output_2 : Import :heavy_check_mark: Export :heavy_check_mark: https://drive.google.com/file/d/1WN-_OxeG2xNY9kIX5T687HqnenxeU2Xc/view?usp=sharing Adjusted_Layer_2_Output_1 : Import :heavy_check_mark: Export :heavy_check_mark: https://drive.google.com/file/d/1P5YUz3jzPeB1_Tbct5dITT7I9_OHzTXs/view?usp=sharing Adjusted_Layer_2_Output_2 : Import :heavy_check_mark: Export :heavy_check_mark: https://drive.google.com/file/d/1MPKy_nWoUOwPinOF_CauBHndTUozqxoR/view?usp=sharing Adjusted_Layer_3_Output_1 : Import :heavy_check_mark: Export :heavy_check_mark: https://drive.google.com/file/d/1GvSsiX9wxqFm_6FA675DNQilTLDrFYGR/view?usp=sharing Adjusted_Layer_3_Output_2 : Import :heavy_check_mark: Export :heavy_check_mark: https://drive.google.com/file/d/1TlECJWHfDaX1-cL04pedPCUzEaFWWCVo/view?usp=sharing RPN_head : Import :heavy_check_mark: Export ❌ https://drive.google.com/file/d/1PcChtc8aKTyN_QhYi6Bq4i07PQKVF8_w/view?usp=sharing


import cv2
import math
import numpy as np
import os
import onnx
import torch
import torch.nn as nn
import torch.nn.functional as F

from torch.autograd import Variable

# Class for the Tracker - SiamRPNPP

class Tracker(nn.Module):
    def __init__(self):
        super(Tracker, self).__init__()

        # build backbone
        self.backbone = resnet50([2,3,4])

        # build adjust layer
        self.neck = AdjustAllLayer([512, 1024, 2048], [256, 256, 256])

        # build rpn head
        self.rpn_head = MultiRPN(anchor_num=5,in_channels=[256, 256, 256],weighted=True)

    def template(self, z):
        zf = self.backbone(z)
        zf = self.neck(zf)
        self.zf = zf

    def track(self, x):
        xf = self.backbone(x)
        xf = self.neck(xf)
        cls, loc = self.rpn_head(self.zf, xf)
        return {'cls': cls,'loc': loc,}

    def log_softmax(self, cls):
        b, a2, h, w = cls.size()
        cls = cls.view(b, 2, a2//2, h, w)
        cls = cls.permute(0, 2, 3, 4, 1).contiguous()
        cls = F.log_softmax(cls, dim=4)
        return cls

    def forward(self, data):
        """ only used in training
        """
        template = data[0]
        search = data[1]

        # get feature
        zf = self.backbone(template)
        xf = self.backbone(search)

        zf = self.neck(zf)
        xf = self.neck(xf)
        cls, loc = self.rpn_head(zf, xf)

        return cls, loc

# End of Tracker - SiamRPNPP

# Class for the Building Blocks required for ResNet

class BasicBlock(nn.Module):
    expansion = 1

    def __init__(self, inplanes, planes, stride=1,downsample=None, dilation=1):
        super(BasicBlock, self).__init__()
        padding = 2 - stride

        if dilation > 1:
            padding = dilation

        dd = dilation
        pad = padding
        if downsample is not None and dilation > 1:
            dd = dilation // 2
            pad = dd

        self.conv1 = nn.Conv2d(inplanes, planes,
                               stride=stride, dilation=dd, bias=False,
                               kernel_size=3, padding=pad)
        self.bn1 = nn.BatchNorm2d(planes)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = conv3x3(planes, planes, dilation=dilation)
        self.bn2 = nn.BatchNorm2d(planes)
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        residual = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)

        if self.downsample is not None:
            residual = self.downsample(x)

        out += residual
        out = self.relu(out)

        return out

class Bottleneck(nn.Module):
    expansion = 4

    def __init__(self, inplanes, planes, stride=1,
                 downsample=None, dilation=1):
        super(Bottleneck, self).__init__()
        self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False)
        self.bn1 = nn.BatchNorm2d(planes)
        padding = 2 - stride
        if downsample is not None and dilation > 1:
            dilation = dilation // 2
            padding = dilation

        assert stride == 1 or dilation == 1, \
            "stride and dilation must have one equals to zero at least"

        if dilation > 1:
            padding = dilation
        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride,
                               padding=padding, bias=False, dilation=dilation)
        self.bn2 = nn.BatchNorm2d(planes)
        self.conv3 = nn.Conv2d(planes, planes * 4, kernel_size=1, bias=False)
        self.bn3 = nn.BatchNorm2d(planes * 4)
        self.relu = nn.ReLU(inplace=True)
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        residual = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)

        out = self.conv3(out)
        out = self.bn3(out)

        if self.downsample is not None:
            residual = self.downsample(x)

        out += residual

        out = self.relu(out)

        return out

# End of Building Blocks

# Class for ResNet - the Backbone neural network

class ResNet(nn.Module):
    def __init__(self, block, layers, used_layers):
        self.inplanes = 64
        super(ResNet, self).__init__()
        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=0,  # 3
                               bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.layer1 = self._make_layer(block, 64, layers[0])
        self.layer2 = self._make_layer(block, 128, layers[1], stride=2)

        self.feature_size = 128 * block.expansion
        self.used_layers = used_layers
        layer3 = True if 3 in used_layers else False
        layer4 = True if 4 in used_layers else False

        if layer3:
            self.layer3 = self._make_layer(block, 256, layers[2],
                                           stride=1, dilation=2)  # 15x15, 7x7
            self.feature_size = (256 + 128) * block.expansion
        else:
            self.layer3 = lambda x: x  # identity

        if layer4:
            self.layer4 = self._make_layer(block, 512, layers[3],
                                           stride=1, dilation=4)  # 7x7, 3x3
            self.feature_size = 512 * block.expansion
        else:
            self.layer4 = lambda x: x  # identity

        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
                m.weight.data.normal_(0, math.sqrt(2. / n))
            elif isinstance(m, nn.BatchNorm2d):
                m.weight.data.fill_(1)
                m.bias.data.zero_()

    def _make_layer(self, block, planes, blocks, stride=1, dilation=1):
        downsample = None
        dd = dilation
        if stride != 1 or self.inplanes != planes * block.expansion:
            if stride == 1 and dilation == 1:
                downsample = nn.Sequential(
                    nn.Conv2d(self.inplanes, planes * block.expansion,
                              kernel_size=1, stride=stride, bias=False),
                    nn.BatchNorm2d(planes * block.expansion),
                )
            else:
                if dilation > 1:
                    dd = dilation // 2
                    padding = dd
                else:
                    dd = 1
                    padding = 0
                downsample = nn.Sequential(
                    nn.Conv2d(self.inplanes, planes * block.expansion,
                              kernel_size=3, stride=stride, bias=False,
                              padding=padding, dilation=dd),
                    nn.BatchNorm2d(planes * block.expansion),
                )

        layers = []
        layers.append(block(self.inplanes, planes, stride,
                            downsample, dilation=dilation))
        self.inplanes = planes * block.expansion
        for i in range(1, blocks):
            layers.append(block(self.inplanes, planes, dilation=dilation))

        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x_ = self.relu(x)
        x = self.maxpool(x_)

        p1 = self.layer1(x)
        p2 = self.layer2(p1)
        p3 = self.layer3(p2)
        p4 = self.layer4(p3)
        out = [x_, p1, p2, p3, p4]
        out = [out[i] for i in self.used_layers]
        if len(out) == 1:
            return out[0]
        else:
            return out

# End of ResNet

# Class for Adjusting the layers of the neural net

class AdjustLayer(nn.Module):
    def __init__(self, in_channels, out_channels, center_size=7):
        super(AdjustLayer, self).__init__()
        self.downsample = nn.Sequential(
            nn.Conv2d(in_channels, out_channels, kernel_size=1, bias=False),
            nn.BatchNorm2d(out_channels),
            )
        self.center_size = center_size

    def forward(self, x):
        x = self.downsample(x)
        l = 3
        r = 10
        x = x[:, :, l:r, l:r]
        return x

class AdjustAllLayer(nn.Module):
    def __init__(self, in_channels, out_channels, center_size=7):
        super(AdjustAllLayer, self).__init__()
        self.num = len(out_channels)
        if self.num == 1:
            self.downsample = AdjustLayer(in_channels[0],
                                          out_channels[0],
                                          center_size)
        else:
            for i in range(self.num):
                self.add_module('downsample'+str(i+2),
                                AdjustLayer(in_channels[i],
                                            out_channels[i],
                                            center_size))

    def forward(self, features):
        if self.num == 1:
            return self.downsample(features)
        else:
            out = []
            for i in range(self.num):
                adj_layer = getattr(self, 'downsample'+str(i+2))
                out.append(adj_layer(features[i]))
            return out        

# End of Class for Adjusting the layers of the neural net

# Class for Region Proposal Neural Network

class RPN(nn.Module):
    def __init__(self):
        super(RPN, self).__init__()

    def forward(self, z_f, x_f):
        raise NotImplementedError

class DepthwiseXCorr(nn.Module):
    def __init__(self, in_channels, hidden, out_channels, kernel_size=3, hidden_kernel_size=5):
        super(DepthwiseXCorr, self).__init__()
        self.conv_kernel = nn.Sequential(
                nn.Conv2d(in_channels, hidden, kernel_size=kernel_size, bias=False),
                nn.BatchNorm2d(hidden),
                nn.ReLU(inplace=True),
                )
        self.conv_search = nn.Sequential(
                nn.Conv2d(in_channels, hidden, kernel_size=kernel_size, bias=False),
                nn.BatchNorm2d(hidden),
                nn.ReLU(inplace=True),
                )
        self.head = nn.Sequential(
                nn.Conv2d(hidden, hidden, kernel_size=1, bias=False),
                nn.BatchNorm2d(hidden),
                nn.ReLU(inplace=True),
                nn.Conv2d(hidden, out_channels, kernel_size=1)
                )

    def forward(self, kernel, search):
        kernel = self.conv_kernel(kernel)
        search = self.conv_search(search)
        feature = xcorr_depthwise(search, kernel)
        out = self.head(feature)
        return out

class DepthwiseRPN(RPN):
    def __init__(self, anchor_num=5, in_channels=256, out_channels=256):
        super(DepthwiseRPN, self).__init__()
        self.cls = DepthwiseXCorr(in_channels, out_channels, 2 * anchor_num)
        self.loc = DepthwiseXCorr(in_channels, out_channels, 4 * anchor_num)

    def forward(self, z_f, x_f):
        cls = self.cls(z_f, x_f)
        loc = self.loc(z_f, x_f)
        return cls, loc

class MultiRPN(RPN):
    def __init__(self, anchor_num, in_channels, weighted=False):
        super(MultiRPN, self).__init__()
        self.weighted = weighted
        for i in range(len(in_channels)):
            self.add_module('rpn'+str(i+2),
                    DepthwiseRPN(anchor_num, in_channels[i], in_channels[i]))
        if self.weighted:
            self.cls_weight = nn.Parameter(torch.ones(len(in_channels)))
            self.loc_weight = nn.Parameter(torch.ones(len(in_channels)))

    def forward(self, z_fs, x_fs):
        cls = []
        loc = []

        rpn2 = self.rpn2
        z_f2 = z_fs[0]
        x_f2 = x_fs[0]
        c2,l2 = rpn2(z_f2, x_f2)
        cls.append(c2)
        loc.append(l2)

        rpn3 = self.rpn3
        z_f3 = z_fs[1]
        x_f3 = x_fs[1]
        c3,l3 = rpn3(z_f3, x_f3)
        cls.append(c3)
        loc.append(l3)

        rpn4 = self.rpn4
        z_f4 = z_fs[2]
        x_f4 = x_fs[2]
        c4,l4 = rpn4(z_f4, x_f4)
        cls.append(c4)
        loc.append(l4)

        if self.weighted:
            cls_weight = F.softmax(self.cls_weight, 0)
            loc_weight = F.softmax(self.loc_weight, 0)

        def avg(lst):
            return sum(lst) / len(lst)

        def weighted_avg(lst, weight):
            s = 0
            fixed_len = 3
            for i in range(3):
                s += lst[i] * weight[i]
            return s

        if self.weighted:
            return weighted_avg(cls, cls_weight), weighted_avg(loc, loc_weight)
        else:
            return avg(cls), avg(loc)

# End of class for RPN

def conv3x3(in_planes, out_planes, stride=1, dilation=1):
    "3x3 convolution with padding"
    return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,
                     padding=dilation, bias=False, dilation=dilation)

def xcorr_depthwise(x, kernel):
    """ depthwise cross correlation
    """
    batch = kernel.size(0)
    channel = kernel.size(1)
    x = x.view(1, batch*channel, x.size(2), x.size(3))
    kernel = kernel.view(batch*channel, 1, kernel.size(2), kernel.size(3))
    conv = nn.Conv2d(batch*channel, batch*channel, kernel_size=(kernel.size(2), kernel.size(3)), bias=False, groups=batch*channel)
    conv.weight = nn.Parameter(kernel)
    out = conv(x) 
    out = out.view(batch, channel, out.size(2), out.size(3))
    out = out.detach()
    return out

def resnet18(used_layers):
    """Constructs a ResNet-18 model.
    """
    model = ResNet(BasicBlock, [2, 2, 2, 2], used_layers)
    return model

def resnet50(used_layers):
    """Constructs a ResNet-50 model.
    """
    model = ResNet(Bottleneck, [3, 4, 6, 3], used_layers)
    return model

# Build the torch backbone model
backbone = resnet50([2,3,4])
backbone.eval()
backbone.state_dict().keys()
backbone_dict = backbone.state_dict()

# Pre-trained Weights to the backbone model
current_path = os.getcwd()
load_path = os.path.join(current_path, "siamrpn_r50_l234_dwxcorr.pth")
pretrained_dict = torch.load(load_path,map_location=torch.device('cpu') )
#pretrained_dict = torch.load(load_path)
pretrained_dict.keys()

pretrained_dict = {k: v for k, v in pretrained_dict.items() if k in backbone_dict}
backbone_dict.update(pretrained_dict)
backbone.load_state_dict(backbone_dict)

# Dummmy Inputs for the torch backbone model
target = Variable(torch.ones(1, 3, 126, 126), requires_grad=False)
search = Variable(torch.ones(1, 3, 224, 224), requires_grad=False)

# Export the torch backbone model to ONNX model (one for target, one for search)
batch_size = 1
torch.onnx.export(backbone, target, "resnet_target.onnx", export_params=True, opset_version=10,
                  do_constant_folding=True, input_names = ['input'], output_names = ['output'])
                  #dynamic_axes={'input' : {0 : 'batch_size'}, 'output' : {0 : 'batch_size'}}

batch_size = 1
torch.onnx.export(backbone, search, "resnet_search.onnx", export_params=True, opset_version=10,
                  do_constant_folding=True, input_names = ['input'], output_names = ['output'])
                  #dynamic_axes={'input' : {0 : 'batch_size'}, 'output' : {0 : 'batch_size'}}

# Load the saved backbone model using ONNX
onnx_resnet_target = onnx.load("resnet_target.onnx")

# Check whether the backbone model has been successfully imported
onnx.checker.check_model(onnx_resnet_target)
print(onnx.checker.check_model(onnx_resnet_target))
onnx.helper.printable_graph(onnx_resnet_target.graph)
print(onnx.helper.printable_graph(onnx_resnet_target.graph))

# Load the saved tracker model using ONNX
onnx_resnet_search = onnx.load("resnet_search.onnx")

# Check whether the tracker model has been successfully imported
onnx.checker.check_model(onnx_resnet_search)
print(onnx.checker.check_model(onnx_resnet_search))
onnx.helper.printable_graph(onnx_resnet_search.graph)
print(onnx.helper.printable_graph(onnx_resnet_search.graph))

# Check the outputs of each of the backbone model
torch_resnet_target_output = backbone(target)
print(torch_resnet_target_output[0].shape) # ===  torch.Size([1, 512, 14, 14]) 
print(torch_resnet_target_output[1].shape) # ===  torch.Size([1, 1024, 14, 14])
print(torch_resnet_target_output[2].shape) # ===  torch.Size([1, 2048, 14, 14])

torch_resnet_target_output_1 = torch_resnet_target_output[0].detach().numpy()
torch_resnet_target_output_2 = torch_resnet_target_output[1].detach().numpy()
torch_resnet_target_output_3 = torch_resnet_target_output[2].detach().numpy()

torch_resnet_search_output = backbone(search) 
print(torch_resnet_search_output[0].shape) # ===  torch.Size([1, 512, 27, 27])
print(torch_resnet_search_output[1].shape) # ===  torch.Size([1, 1024, 27, 27])
print(torch_resnet_search_output[2].shape) # ===  torch.Size([1, 2048, 27, 27])  

torch_resnet_search_output_1 = torch_resnet_search_output[0].detach().numpy()
torch_resnet_search_output_2 = torch_resnet_search_output[1].detach().numpy()
torch_resnet_search_output_3 = torch_resnet_search_output[2].detach().numpy()

# neck = AdjustAllLayer([512, 1024, 2048], [256, 256, 256])
neck_1 = AdjustAllLayer([512], [256])

neck_1.eval()
neck_1.state_dict().keys()
neck_1_dict = neck_1.state_dict()

current_path = os.getcwd()
load_path = os.path.join(current_path, "siamrpn_r50_l234_dwxcorr.pth")
pretrained_dict = torch.load(load_path,map_location=torch.device('cpu') )
#pretrained_dict = torch.load(load_path)
pretrained_dict.keys()

pretrained_dict_1 = pretrained_dict

pretrained_dict_1 = {k: v for k, v in pretrained_dict_1.items() if k in neck_1_dict}
neck_1_dict.update(pretrained_dict_1)
neck_1.load_state_dict(neck_1_dict)

# Export the torch tracker model to ONNX model
batch_size = 1
torch.onnx.export(neck_1, torch_resnet_target_output[0], "neck_1_out_1.onnx", export_params=True, opset_version=10,
                  do_constant_folding=True, input_names = ['input'], output_names = ['output'])
                  #dynamic_axes={'input' : {0 : 'batch_size'}, 'output' : {0 : 'batch_size'}}

batch_size = 1
torch.onnx.export(neck_1, torch_resnet_search_output[0], "neck_1_out_2.onnx", export_params=True, opset_version=10,
                  do_constant_folding=True, input_names = ['input'], output_names = ['output'])
                  #dynamic_axes={'input' : {0 : 'batch_size'}, 'output' : {0 : 'batch_size'}}

# Load the saved tracker model using ONNX
onnx_neck_1_out_1_model = onnx.load("neck_1_out_1.onnx")

# Check whether the tracker model has been successfully imported
onnx.checker.check_model(onnx_neck_1_out_1_model)
print(onnx.checker.check_model(onnx_neck_1_out_1_model))
onnx.helper.printable_graph(onnx_neck_1_out_1_model.graph)
print(onnx.helper.printable_graph(onnx_neck_1_out_1_model.graph))

# Load the saved tracker model using ONNX
onnx_neck_1_out_2_model = onnx.load("neck_1_out_2.onnx")

# Check whether the tracker model has been successfully imported
onnx.checker.check_model(onnx_neck_1_out_2_model)
print(onnx.checker.check_model(onnx_neck_1_out_2_model))
onnx.helper.printable_graph(onnx_neck_1_out_2_model.graph)
print(onnx.helper.printable_graph(onnx_neck_1_out_2_model.graph))

neck_2 = AdjustAllLayer([1024], [256])

neck_2.eval()
neck_2.state_dict().keys()
neck_2_dict = neck_2.state_dict()

pretrained_dict_2 = pretrained_dict
pretrained_dict_2 = {k: v for k, v in pretrained_dict_2.items() if k in neck_2_dict}
neck_2_dict.update(pretrained_dict_2)
neck_2.load_state_dict(neck_2_dict)

# Export the torch tracker model to ONNX model
batch_size = 1
torch.onnx.export(neck_2, torch_resnet_target_output[1], "neck_2_out_1.onnx", export_params=True, opset_version=10,
                  do_constant_folding=True, input_names = ['input'], output_names = ['output'])
                  #dynamic_axes={'input' : {0 : 'batch_size'}, 'output' : {0 : 'batch_size'}}

batch_size = 1
torch.onnx.export(neck_2, torch_resnet_search_output[1], "neck_2_out_2.onnx", export_params=True, opset_version=10,
                  do_constant_folding=True, input_names = ['input'], output_names = ['output'])
                  #dynamic_axes={'input' : {0 : 'batch_size'}, 'output' : {0 : 'batch_size'}}

# Load the saved tracker model using ONNX
onnx_neck_2_out_1_model = onnx.load("neck_2_out_1.onnx")

# Check whether the tracker model has been successfully imported
onnx.checker.check_model(onnx_neck_2_out_1_model)
print(onnx.checker.check_model(onnx_neck_2_out_1_model))
onnx.helper.printable_graph(onnx_neck_2_out_1_model.graph)
print(onnx.helper.printable_graph(onnx_neck_2_out_1_model.graph))

# Load the saved tracker model using ONNX
onnx_neck_2_out_2_model = onnx.load("neck_2_out_2.onnx")

# Check whether the tracker model has been successfully imported
onnx.checker.check_model(onnx_neck_2_out_2_model)
print(onnx.checker.check_model(onnx_neck_2_out_2_model))
onnx.helper.printable_graph(onnx_neck_2_out_2_model.graph)
print(onnx.helper.printable_graph(onnx_neck_2_out_2_model.graph))

neck_3 = AdjustAllLayer([2048], [256])

neck_3.eval()
neck_3.state_dict().keys()
neck_3_dict = neck_3.state_dict()

pretrained_dict_3 = pretrained_dict

pretrained_dict_3 = {k: v for k, v in pretrained_dict_3.items() if k in neck_3_dict}
neck_3_dict.update(pretrained_dict_3)
neck_3.load_state_dict(neck_3_dict)

# Export the torch tracker model to ONNX model
batch_size = 1
torch.onnx.export(neck_3, torch_resnet_target_output[2], "neck_3_out_1.onnx", export_params=True, opset_version=11,
                  do_constant_folding=True, input_names = ['input'], output_names = ['output'])
                  #dynamic_axes={'input' : {0 : 'batch_size'}, 'output' : {0 : 'batch_size'}}

batch_size = 1
torch.onnx.export(neck_3, torch_resnet_search_output[2], "neck_3_out_2.onnx", export_params=True, opset_version=11,
                  do_constant_folding=True, input_names = ['input'], output_names = ['output'])
                  #dynamic_axes={'input' : {0 : 'batch_size'}, 'output' : {0 : 'batch_size'}}

# Load the saved tracker model using ONNX
onnx_neck_3_out_1_model = onnx.load("neck_3_out_1.onnx")

# Check whether the tracker model has been successfully imported
onnx.checker.check_model(onnx_neck_3_out_1_model)
print(onnx.checker.check_model(onnx_neck_3_out_1_model))
onnx.helper.printable_graph(onnx_neck_3_out_1_model.graph)
print(onnx.helper.printable_graph(onnx_neck_3_out_1_model.graph))

# Load the saved tracker model using ONNX
onnx_neck_3_out_2_model = onnx.load("neck_3_out_2.onnx")

# Check whether the tracker model has been successfully imported
onnx.checker.check_model(onnx_neck_3_out_2_model)
print(onnx.checker.check_model(onnx_neck_3_out_2_model))
onnx.helper.printable_graph(onnx_neck_3_out_2_model.graph)
print(onnx.helper.printable_graph(onnx_neck_3_out_2_model.graph))

torch_neck_1_out_1 = neck_1(torch_resnet_target_output[0])
torch_neck_2_out_1 = neck_2(torch_resnet_target_output[1])
torch_neck_3_out_1 = neck_3(torch_resnet_target_output[2])
print(torch_neck_1_out_1.shape)
print(torch_neck_2_out_1.shape)
print(torch_neck_3_out_1.shape)

torch_neck_1_out_2 = neck_1(torch_resnet_search_output[0])
torch_neck_2_out_2 = neck_2(torch_resnet_search_output[1])
torch_neck_3_out_2 = neck_3(torch_resnet_search_output[2])
print(torch_neck_1_out_2.shape)
print(torch_neck_2_out_2.shape)
print(torch_neck_3_out_2.shape)

torch_neck_out_1 = [torch_neck_1_out_1, torch_neck_2_out_1, torch_neck_3_out_1]
torch_neck_out_1 = torch.stack(torch_neck_out_1)

torch_neck_out_2 = [torch_neck_1_out_2, torch_neck_2_out_2, torch_neck_3_out_2]
torch_neck_out_2 = torch.stack(torch_neck_out_2)

print(torch_neck_out_1.shape)
print(torch_neck_out_2.shape)

# Build the torch head model
rpn_head = MultiRPN(anchor_num=5,in_channels=[256, 256, 256],weighted=True)
rpn_head.eval()
rpn_head.state_dict().keys()
rpn_head_dict = rpn_head.state_dict()

# Pre-trained Weights to the backbone model
current_path = os.getcwd()
load_path = os.path.join(current_path, "siamrpn_r50_l234_dwxcorr.pth")
pretrained_dict = torch.load(load_path,map_location=torch.device('cpu') )
#pretrained_dict = torch.load(load_path)
pretrained_dict.keys()

pretrained_dict = {k: v for k, v in pretrained_dict.items() if k in rpn_head_dict}
rpn_head_dict.update(pretrained_dict)
rpn_head.load_state_dict(rpn_head_dict)

# Export the torch head model to ONNX model
batch_size = 1
torch.onnx.export(rpn_head, (torch_neck_out_1, torch_neck_out_2), "rpn_head.onnx", export_params=True, opset_version=11,
                  do_constant_folding=True, input_names = ['z_fs', 'x_fs'], output_names = ['output'])
                  #dynamic_axes={'input' : {0 : 'batch_size'}, 'output' : {0 : 'batch_size'}}

# Load the saved tracker model using ONNX
onnx_head_model = onnx.load("rpn_head.onnx")

# Check whether the tracker model has been successfully imported
onnx.checker.check_model(onnx_head_model)
print(onnx.checker.check_model(onnx_head_model))    
onnx.helper.printable_graph(onnx_head_model.graph)
print(onnx.helper.printable_graph(onnx_head_model.graph))

# Where the error rises
cv_rpn_head = cv2.dnn.readNetFromONNX('rpn_head.onnx')

jinyup100 commented 3 years ago

In the last week's meeting, it was mentioned that tasks for the week were: 1) Attempt to change the Multi-RPN class by removing functional softmax / any other complexities by deleting loops that are removable 2) Refer to DaSiamRPN and see how pre-trained weights have been used and possibly use a single RPN 3) Attempt a fix on ONNX importer.cpp file by making appropriate changes to loop that deals with the "Gather" layer.

The main problem of the task in its entirety was the rise of the gather layer when converting from a pytorch model to an ONNX model. The reason why the gather layer is formed is due to the implementation of the depthwise correlation layer which uses PyTorch's .view() function which creates gather, concat and unsqueeze. For the sake of this neural network, I thought the task could be accomplished by simplifying the onnx model as suggested in (https://github.com/onnx/onnx-tensorrt/issues192) and perhaps by using the functions in the repository (https://github.com/daquexian/onnx-simplifier) which removes over-complication of the static layer.

Thanks to @l-bat and the PR #17890, however, the model now successfully supports multiple inputs and the formation of the gather layer.

The only problem now as it stands is the construction of the customised convolution layer where I am working on setting the appropriate weights of the final nn.Conv2d, which as the picture shows, is failing to take the kernel.

depthwiseXCorr_new

bhack commented 3 years ago

@l-bat Can you ask internally if we could maintain the ONNX exporting code in a not runnable (textfile?) on the repository? I think It could enable:

the standard commit/revision flow also for the exporting component
- It could be useful for developers, after merge, to easly find the reference exporting instead to search for description on the PR that introduced the new model.

l-bat commented 3 years ago

@l-bat Can you ask internally if we could maintain the ONNX exporting code in a not runnable (textfile?) on the repository? I think It could enable:

the standard commit/revision flow also for the exporting component

It could be useful for developers, after merge, to easly find the reference exporting instead to search for description on the PR that introduced the new model.

We can maintain ONNX exporting code at https://gist.github.com. After every update Jin should inform us about it.

bhack commented 3 years ago

We can maintain ONNX exporting code at https://gist.github.com. After every update Jin should inform us about it.

I have elaborated more on this solution and it is why I am proposing to evaluate the onnx export code storage in the opencv repo:

Gist doesn't support inline review/comment https://github.com/isaacs/github/issues/243 so it is hard to follow a regular review pipeline.
On Gist (as Opencv.org/team) we don't control the ownership/availability of the exporting code cause it is outside of the repo perimeter. The code on gist could be removed or not be available anymore. So other then the original weights, the onnx file storage etc. we are increasing the surface of external assets with the exporting code and so the risk to reproduce the exportation process.
We don't expect to run the code in a pure opencv env. It will be more like a "documentation" asset that we own and that could be run in another "external" user env (opencv+pytorch).

l-bat commented 3 years ago

We can maintain ONNX exporting code at https://gist.github.com. After every update Jin should inform us about it.

I have elaborated more on this solution and it is why I am proposing to evaluate the onnx export code storage in the opencv repo:

Gist doesn't support inline review/comment isaacs/github#243 so it is hard to follow a regular review pipeline.

On Gist (as Opencv.org/team) we don't control the ownership/availability of the exporting code cause it is outside of the repo perimeter. The code on gist could be removed or not be available anymore. So other then the original weights, the onnx file storage etc. we are increasing the surface of external assets with the exporting code and so the risk to reproduce the exportation process.

We don't expect to run the code in a pure opencv env. It will be more like a "documentation" asset that we own and that could be run in another "external" user env (opencv+pytorch).

We don't want to store scripts for generating models in OpenCV repo. I can suggest to Jin to create repository for GSoC project or fork origin repo (https://github.com/STVIR/pysot) and add script to generate ONNX models. Jin can open local PR and we will rewiev it. We will add the final version of the export code to the description of this PR.

bhack commented 3 years ago

We don't want to store scripts for generating models in OpenCV repo. I can suggest to Jin to create repository for GSoC project or fork origin repo (https://github.com/STVIR/pysot) and add script to generate ONNX models. Jin can open local PR and we will rewiev it. We will add the final version of the export code to the description of this PR.

Thanks for investigating again on this internal policy. I will try to explore, more in generally, this topic in the next Opencv meeting (/cc @vpisarev)

jinyup100 commented 3 years ago

Following yesterday/s meeting I have uploaded ONNX exporting code at (https://gist.github.com/jinyup100/bc2fb2d25ac5ac1ac635c9f2b62853d7). The code should successfully export ONNX format of the PyTorch model. The below are the downloadable links to all the ONNX models that have been exported using the code in the gist.

Below are the Pre-Trained Weights and successfully converted ONNX format of the models: Pre-Trained Weights in pth Format https://drive.google.com/file/d/1A22KNXSGitHYnz6p7An5OAnR0wMSnYPi/view?usp=sharing

ResNet50_target : Import :heavy_check_mark: Export :heavy_check_mark: https://drive.google.com/file/d/16tg1w_NNJCsNMNe_OECjSFZe0-q5QTdQ/view?usp=sharing

ResNet50_search : Import :heavy_check_mark: Export :heavy_check_mark: https://drive.google.com/file/d/1iFXFdE3oj3UdsvumNpgRnXkp43yaAu35/view?usp=sharing

Adjusted_Layer_1_Output_1 : Import :heavy_check_mark: Export :heavy_check_mark: https://drive.google.com/file/d/1VdRWV3qdRZASr8ePXDFM6qmJu8amdzQv/view?usp=sharing

Adjusted_Layer_1_Output_2 : Import :heavy_check_mark: Export :heavy_check_mark: https://drive.google.com/file/d/16TsPYDwo2yBg7Hzs0uB5ZAksYFdapAfH/view?usp=sharing

Adjusted_Layer_2_Output_1 : Import :heavy_check_mark: Export :heavy_check_mark: https://drive.google.com/file/d/16TsPYDwo2yBg7Hzs0uB5ZAksYFdapAfH/view?usp=sharing

Adjusted_Layer_2_Output_2 : Import :heavy_check_mark: Export :heavy_check_mark: https://drive.google.com/file/d/1JbOxQg87n6kD5zOVqwfZj3yxfEy3mWMD/view?usp=sharing

Adjusted_Layer_3_Output_1 : Import :heavy_check_mark: Export :heavy_checkmark: https://drive.google.com/file/d/1-vVr7Jr_jlITfCKawPXGQlAS8rjMBGZ/view?usp=sharing

Adjusted_Layer_3_Output_2 : Import :heavy_check_mark: Export :heavy_check_mark: https://drive.google.com/file/d/1KhlcBeM2gotZB2-OebrUMqOlD43vmZu6/view?usp=sharing

RPN_head : Import : :heavy_check_mark: Export :heavy_check_mark:
https://drive.google.com/file/d/1KVS8svBJZqu4aB-b6V3LNK0yDTd8Yo8P/view?usp=sharing

jinyup100 commented 3 years ago

To summarise last week's work: 7/22 (Wednesday): The problem with the Depthwise Correlation Layer _xcorrdepthwise seemed to have been solved because I naively assumed that the size of the search and the size of the kernel were equal. If the assumption did hold, the depthwise convolution could have been replaced by element-wise multiplication and Global Average Pooling

7/23 (Thursday) I started off writing the SiamRPN++ Class and upon the process of doing so, I realised that the output size did not equal what it should equal - and that it is indeed illogical for the size of the search and the size of the kernel are always equal. I realised that this was because of my error in the implementation of the adjusted layers and that there should be two different types of adjusted layers - one that simply downsamples the input and the other that further concats the output from downsampling. I have made the update in the gist (https://gist.github.com/jinyup100/bc2fb2d25ac5ac1ac635c9f2b62853d7)

7/24 (Friday) Now that different Adjusted Layers have been implemented, the original problem has re-occurred - The problem with the Depthwise Correlation Layer.

The key problem with the _xcorrdepthwise function is that the search and the kernel change - they are dependent on the input. That is, the weight of the conv2d is not constant.

@l-bat suggested that I should attempt to make a custom pytorch layer for the implementation. I have accordingly made various attempts (https://gist.github.com/jinyup100/cca09da59fb90f98a757f61e59862c7c) to create such layer, however, they failed to be exported as an onnx model.

7/28(Tuesday) @l-bat has mentioned that she will be making a PR in OpenCV to accommodate the convolution with inconstant weight (kernel). It was also mentioned that making a custom pytorch layer is undesired and that the SiamRPN++ model should be implemented without the custom layers.

Following the meeting, it has been decided that I should be making the new commit on the PR as soon as possible - and I now have the latest commit, which I believe should work upon the successful implementation of the depthwise correlation layer and the consequent successful export of _rpnhead. I will also be making a daily report on the progress I am making.

For today, I have made the commit and whilst @l-bat makes the relevant PR, I will continue to search other ways to implement the depthwise convolution.

jinyup100 commented 3 years ago

Thank you for the PR @l-bat . I have ran the model using the PR and I can confirm that the output from the PyTorch implementation and the output from the imported ONNX model are the same - please do refer to the notebook for the reference (https://drive.google.com/file/d/1MFY7G6BwAMb8a7g45GRxxHEzeItK4_e2/view?usp=sharing). Before tomorrow's meeting I will try to run the SiamRPN Class model using the imported ONNX model.

l-bat commented 3 years ago

Please remove trailing whitespaces https://pullrequest.opencv.org/buildbot/builders/precommit_docs/builds/25970/steps/whitespace%20opencv/logs/stdio

bhack commented 3 years ago

Can we hide some long "code" comments as outdated/resolved so that we only have the final export code in the PR description?

jinyup100 commented 3 years ago

@l-bat I have removed the trailing whitespaces and I am currently trying to locate the source of numpy overflow - I have a good idea of how it might have occurred so I will be making progress by tomorrow. @bhack I have deleted all the comments that I have resolved - and I will remove them as soon as I resolve them. All the old comments that seem to be unnecessary I will also remove.

jinyup100 commented 3 years ago

Hello - I have been trying to locate the source of numpy overflow. Whilst debugging the file, and using the input_target (z_crop) and input_search (x_crop) directly from the images, I found out that the outputs from the imported ONNX model and the outputs from the Torch RPN Model differ by e6 giving a huge outflow. The numpy inputs I have used are (z_crop : https://drive.google.com/file/d/1z1eWGF12mBcIH5G9-tPYbIbGgvN7gJtp/view?usp=sharing) and (x_crop : https://drive.google.com/file/d/1pqt3HSzmLMc_FBEMIzwD8awGw1WoPPA_/view?usp=sharing). The codes I have implemented to test the difference is show in this gist (https://gist.github.com/jinyup100/53c5e542573e46d11e7ed881af66af57)

I think the problem is because the inconstant weight is not being updated as the frame of the video changes. I think this problem can be solved by further dividing MultiRPN into smaller parts and feeding in the weights to the _xcorrdepthwise layer directly using @l-bat's latest PR. I will update tomorrow how this works out but please do let me know if you have any suggestion - because in theory I think the outputs should be the same.

On a seperate note, I have fixed _convert_score in such way that it prevents any potential outflow. I will commit tomorrow hopefully along with any further fix I make of the model. Thank you.

l-bat commented 3 years ago

@jinyup100 please verify link to RPN_head is up to date

jinyup100 commented 3 years ago

@jinyup100 please verify link to RPN_head is up to date

@l-bat I can verify that the link to RPN_head is up-to-date. I have re-uploaded just in case.

l-bat commented 3 years ago

I reproduced your error. I have some ideas. If I save onnx model and check difference after that, then torch output is close to opencv output. But if I used previously saved onnx model, then I get a big difference. OpenCV outputs are equal in both cases. So, I guess the problem with saving model to ONNX. Try to save onnx model twice with the same inputs, run pytorch model and save outputs. Check that pytorch outputs are equal and compare them with opencv outputs (for each model).

jinyup100 commented 3 years ago

I reproduced your error. I have some ideas. If I save onnx model and check difference after that, then torch output is close to opencv output. But if I used previously saved onnx model, then I get a big difference. OpenCV outputs are equal in both cases. So, I guess the problem with saving model to ONNX. Try to save onnx model twice with the same inputs, run pytorch model and save outputs. Check that pytorch outputs are equal and compare them with opencv outputs (for each model).

@l-bat Just to check I fully understand your comments - I do agree that directly saving the ONNX model (real-time) and checking the difference gives a torch output close to opencv output. And as you said, using previously saved ONNX model gives a big difference.

When you are saying "Try to save the ONNX model twice with the same inputs", are you stating that I should try run the code twice like this for each model (backbone, neck, and rpn_head)?:

torch.onnx.export(backbone_search, z_crop, "backbone_search.onnx", export_params=True, opset_version=11,
                  do_constant_folding=True, input_names = ['input'], output_names = ['output'],
                  dynamic_axes={'input' : {0 : 'batch_size'}, 'output' : {0 : 'batch_size'}}) 

torch.onnx.export(backbone_search, z_crop, "backbone_search.onnx", export_params=True, opset_version=11,
                  do_constant_folding=True, input_names = ['input'], output_names = ['output'],
                  dynamic_axes={'input' : {0 : 'batch_size'}, 'output' : {0 : 'batch_size'}})

Also when you are saying the same inputs - are you recommending I should use 'z_crop' and 'x_crop' for all the inputs to be used?

Thank you for the clarification and I will let you know how this goes!

l-bat commented 3 years ago

I mean to do something like this https://gist.github.com/l-bat/bb5f79542a4886529832ac653f6de412

jinyup100 commented 3 years ago

@l-bat Thank you for the clarification - I will update you with the results.

jinyup100 commented 3 years ago

@l-bat I do not think there is an error with exporting the model itself. Using inputs of the shape

zf_s = np.random.rand(*zf_s.shape)
xf_s = np.random.rand(*xf_s.shape)

certainly seems to give the same outputs for the torch model and the imported OpenCV model.

The problem seems to be related to how I pass on the pre-trained weights ''siamrpn_r50_l234_dwxcorr.pth" to the model.

# Build the torch backbone model
backbone = ResNet(Bottleneck, [3, 4, 6, 3], [2, 3, 4])
backbone.eval()
backbone.state_dict().keys()
backbone_dict = backbone.state_dict()

# Load the pre-trained weight to the torch model
pretrained_dict_backbone = {a: b for a, b in pretrained_dict_backbone.items() if a in backbone_dict}
backbone_dict.update(pretrained_dict_backbone)
backbone.load_state_dict(backbone_dict)

torch_z = backbone(torch.Tensor(z_crop))
print(torch_z[0])

When I run this code repeatedly, the value of torch_z changes every repetition.

This seems to indicate the fact that the pre-trained weights are not being loaded onto the model correctly. I will further investigate this problem tomorrow thank you.

l-bat commented 3 years ago

@jinyup100 Maybe not all ResNet weights are stored in siamrpn_r50_l234_dwxcorr.pth. Can you check that all weights for Resnet are in pretrained_dict_backbone?

for param_tensor in backbone_dict.state_dict():
    print(param_tensor, "\t", model.state_dict()[param_tensor])

Try to find the layer where the weights are changing. You can load two models with different random seed and compare weights for each layer.

jinyup100 commented 3 years ago

@l-bat I managed to resolve the issue by checking all the parameters in the pre-trained weight. From the meeting 2 weeks ago you suggested checking the outputs using

np.max(abs(torch_cls-opencv_cls))
np.max(abs(torch_loc-opencv_loc))

And I get 0.9067 and 0.3380 - are these margin of errors small enough such that they can be ignored?

l-bat commented 3 years ago

@l-bat I managed to resolve the issue by checking all the parameters in the pre-trained weight. From the meeting 2 weeks ago you suggested checking the outputs using
np.max(abs(torch_cls-opencv_cls))
np.max(abs(torch_loc-opencv_loc))
And I get 0.9067 and 0.3380 - are these margin of errors small enough such that they can be ignored?

@jinyup100 margin of errors depend on output range print(np.min(torch_cls), np.max(torch_cls)). I would recommend using uniform distribution over [0, 1) for inputs. Then np.max(abs(torch_cls-opencv_cls)) should be less than 10e-5.

jinyup100 commented 3 years ago

@l-bat Using uniform distribution over [0,1] inputs using np.random.rand, I have compared outputs of all the models (ResNet, Adjusted Layers and RPN_head). Whereas the outputs from ResNet and Adjusted Layers are less than 10e-5, the outputs from RPN (cls and loc) 0.823 and 0.180, which are obviously greater than 10e-5. I believe this is potentially due to replacing original xcorr_depthwise F.conv2d with newly implemented xcorr_depthwise nn.Conv2d. Should I look for ways to re-implement xcorr_depthwise such that the differences in the outputs are less than 10e-5?

Please see the notebook (https://drive.google.com/file/d/1LbJqpkuK_bxD_CfGqlE2MAZjRlhHjTI9/view?usp=sharing) for context

bhack commented 3 years ago

Are F.conv2d and nn.Conv2d in you layer producing the same intermediate output? You could test like in https://discuss.pytorch.org/t/difference-results-with-torch-nn-conv2d-and-torch-nn-functional-conv2d/69231/4

ieliz commented 3 years ago

As I understand, @l-bat has a suggestion, that you using weights, that are absent in the pre-trained model, so they are randomly initializing every time. You should check with Netron - all of the weights are initialized with pre-trained weights or not.

jinyup100 commented 3 years ago

@l-bat Hello I have made according adjustment to the RPN such that now all the differences in the outputs are < 10e-5. This was because two of the convolutional layers were not being used when it was meant to be used. These two convolutional layers use F.softmax which is currently causing the problem when trying to import the "rpn.onnx" file. I tried to implement this F.softmax using PyTorch's exponential function and it raises an error _"Missed axes attribute in function 'cv::dnn::experimental_dnn_34v18::SoftMaxSubgraph::match'" when trying to import the RPN_onnx file. Do you possibly have any idea why this may be the case?

The code for the export of the latest RPN is shown in: https://gist.github.com/jinyup100/488ff49932672ce7291717033eb60d73

And the inputs required to produce this RPN is given here: zfs : https://drive.google.com/file/d/1qGX7Cz60D3ueeWtjdmaOipfGk9k9B7In/view?usp=sharing xfs : https://drive.google.com/file/d/1u0_XbInJ4PCASmhjF9ipG-MTd_QpOp3U/view?usp=sharing

Latest RPN ONNX File : https://drive.google.com/file/d/1zPSnVAjq6uDzlp5wgmGebUbZXSWgixIn/view?usp=sharing

jinyup100 commented 3 years ago

Hello. In yesterday's meeting with @l-bat and @ieliz two solutions were suggested by @l-bat to address the problem above relating to importing the RPN_onnx file. The problem was concerning a softmax function inside the RPN_onnx file which is not currently supported by OpenCV.

The first solution by @l-bat was to hard-code the value of the weights used in the file. Initially, the value of the weights were meant to be calculated by taking a softmax of pre-trained weights of fixed-values. Therefore the values shown below are actually the outputs of the softmax of the pre-trained weights. The only reason why this solution was possible was because the value of these weights remained constant throughout the propagation, independent of any inputs. Using this set of solution, I have successfully exported and imported all the models and have implemented a SiamRPN++ tracker sample. The commit for the SiamRPN++ tracker will be made today.

self.weight_cls = nn.Parameter(torch.Tensor([0.38156851768108546, 0.4364767608115956,  0.18195472150731892]))
self.weight_loc = nn.Parameter(torch.Tensor([0.17644893463361863, 0.16564198028417967, 0.6579090850822015]))

The second solution by @l-bat was to expand the scope of this GSoC Project to modify the onnx_graph_simplifier such that it supports the softmax function. To be more specific, since reduce max / sub / exp/ reduce sum is not currently supported, @l-bat thought it was a good idea to make a new PR which supports the softmax layer - one that would essentially remove reduce max / sub / exp / reduce sum and replace it with one block of softmax layer.

(the full onnx file is given here : https://drive.google.com/file/d/1zPSnVAjq6uDzlp5wgmGebUbZXSWgixIn/view?usp=sharing)

@bhack During the meeting, the mentors and I wished to hear your thoughts on expanding the scope of this GSoC to include a support for the softmax layer.

Meanwhile, I will make the new commit with the full functionality of SiamRPN++ with the hardcoded values for the pre-trained weights. Thank you.

bhack commented 3 years ago

I am ok about covering the sofmax layer but I don't think it could be classified as an "expansion" of the scope of this GSOC. I think that this kind of things when you are importing a new model it is part of the regular activity. Now we had an hack solution with fixed weight and ok... but it is not so much useful for the library as whole.

since reduce max / sub / exp/ reduce sum

@l-bat Do you propose to implement directly the whole high level softmax ops just for timing instead of the single low level ops?

jinyup100 commented 3 years ago

I am ok about covering the sofmax layer but I don't think it could be classified as an "expansion" of the scope of this GSoC. I think that this kind of things when you are importing a new model it is part of the regular activity.

I do agree with you that it is within the scope of the project and that it is a normal activity - in that as we solve old problems we face new problems.

Now we had an hack solution with fixed weight and ok... but it is not so much useful for the library as whole.

Yes we do have a shortcut solution - but an addition of the support for the softmax layer would improve the convenience of using the library.

@l-bat has given me some references to refer to and I will be looking at them meanwhile.

l-bat commented 3 years ago

@jinyup100 neck_1.onnx and neck_2.onnx are new models? Where can I download them?

jinyup100 commented 3 years ago

The Final Version of the ONNX exporting code is available at: (https://gist.github.com/jinyup100/bc2fb2d25ac5ac1ac635c9f2b62853d7).

The Final Version of the Pre-Trained Weights and successfully converted ONNX format of the models are available at:

Pre-Trained Weights in pth Format https://drive.google.com/file/d/1PBtRDiWAIaGthMKdzyJL9qYC6ODGnXk5/view?usp=sharing

ResNet50_target : Import :heavy_check_mark: Export :heavy_check_mark: https://drive.google.com/file/d/1jwguUwfvBa-EbDXRDfuyCrvqLPZcFZo0/view?usp=sharing

ResNet50_search : Import :heavy_check_mark: Export :heavy_check_mark: https://drive.google.com/file/d/1L6kxi_WkdH__kfrPC_R-ZsaPD4ky52tI/view?usp=sharing

Adjusted_Layer_1 : Import :heavy_check_mark: Export :heavy_check_mark: https://drive.google.com/file/d/1690vawz9b76Bwskg3CXtBb1Fd59qBsRw/view?usp=sharing

Adjusted_Layer_2 : Import :heavy_check_mark: Export :heavy_check_mark: https://drive.google.com/file/d/1E7nX5MzSzVpw6a2Mm5krxp0reLrO1-je/view?usp=sharing

RPN_head : Import : :heavy_check_mark: Export :heavy_check_mark:
https://drive.google.com/file/d/15Sgh1YwdH_fCnbTzhsU-HcFKpSMjPMLY/view?usp=sharing

l-bat commented 3 years ago

@jinyup100 Please add final code to generating ONNX models in the PR description, add links to ONNX models at the beginning of the sample like https://github.com/opencv/opencv/blob/master/samples/dnn/virtual_try_on.py#L3. Rebase your PR to 3.4 branch.

l-bat commented 3 years ago

Could you experiment with the OpenVINO backend and different targets (CPU, OpenCLFP32, OpenCLFP16)?

wenhaotang commented 3 years ago

@jinyup100 When I run the code 'Torch_to_ONNX_Reduced.py', the following error occurred:

Traceback (most recent call last):
  File ".\Torch_to_ONNX_Reduced.py", line 490, in <module>
    do_constant_folding=True, input_names = ['input_1', 'input_2'], output_names = ['output_1', 'output_2'])
  File "E:\Users\tang\Anaconda3\lib\site-packages\torch\onnx\__init__.py", line 148, in export
    strip_doc_string, dynamic_axes, keep_initializers_as_inputs)
  File "E:\Users\tang\Anaconda3\lib\site-packages\torch\onnx\utils.py", line 66, in export
    dynamic_axes=dynamic_axes, keep_initializers_as_inputs=keep_initializers_as_inputs)
  File "E:\Users\tang\Anaconda3\lib\site-packages\torch\onnx\utils.py", line 416, in _export
    fixed_batch_size=fixed_batch_size)
  File "E:\Users\tang\Anaconda3\lib\site-packages\torch\onnx\utils.py", line 279, in _model_to_graph
    graph, torch_out = _trace_and_get_graph_from_model(model, args, training)
  File "E:\Users\tang\Anaconda3\lib\site-packages\torch\onnx\utils.py", line 236, in _trace_and_get_graph_from_model
    trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph(model, args, _force_outplace=True, _return_inputs_states=True)
  File "E:\Users\tang\Anaconda3\lib\site-packages\torch\jit\__init__.py", line 277, in _get_trace_graph
    outs = ONNXTracedModule(f, _force_outplace, return_inputs, _return_inputs_states)(*args, **kwargs)
  File "E:\Users\tang\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "E:\Users\tang\Anaconda3\lib\site-packages\torch\jit\__init__.py", line 360, in forward
    self._force_outplace,
RuntimeError: hasSpecialCase INTERNAL ASSERT FAILED at ..\torch\csrc\jit\passes\alias_analysis.cpp:300, please report a bug to PyTorch. We don't have an op for aten::uniform but it isn't a special case. (analyzeImpl at ..\torch\csrc\jit\passes\alias_analysis.cpp:300)
(no backtrace available)

torch version is 1.4.0+cpu onnx verson is 1.8.0

Can you give me some advice？Thank you in advance.

jinyup100 commented 3 years ago

@jinyup100 When I run the code 'Torch_to_ONNX_Reduced.py', the following error occurred:

Traceback (most recent call last):
  File ".\Torch_to_ONNX_Reduced.py", line 490, in <module>
    do_constant_folding=True, input_names = ['input_1', 'input_2'], output_names = ['output_1', 'output_2'])
  File "E:\Users\tang\Anaconda3\lib\site-packages\torch\onnx\__init__.py", line 148, in export
    strip_doc_string, dynamic_axes, keep_initializers_as_inputs)
  File "E:\Users\tang\Anaconda3\lib\site-packages\torch\onnx\utils.py", line 66, in export
    dynamic_axes=dynamic_axes, keep_initializers_as_inputs=keep_initializers_as_inputs)
  File "E:\Users\tang\Anaconda3\lib\site-packages\torch\onnx\utils.py", line 416, in _export
    fixed_batch_size=fixed_batch_size)
  File "E:\Users\tang\Anaconda3\lib\site-packages\torch\onnx\utils.py", line 279, in _model_to_graph
    graph, torch_out = _trace_and_get_graph_from_model(model, args, training)
  File "E:\Users\tang\Anaconda3\lib\site-packages\torch\onnx\utils.py", line 236, in _trace_and_get_graph_from_model
    trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph(model, args, _force_outplace=True, _return_inputs_states=True)
  File "E:\Users\tang\Anaconda3\lib\site-packages\torch\jit\__init__.py", line 277, in _get_trace_graph
    outs = ONNXTracedModule(f, _force_outplace, return_inputs, _return_inputs_states)(*args, **kwargs)
  File "E:\Users\tang\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "E:\Users\tang\Anaconda3\lib\site-packages\torch\jit\__init__.py", line 360, in forward
    self._force_outplace,
RuntimeError: hasSpecialCase INTERNAL ASSERT FAILED at ..\torch\csrc\jit\passes\alias_analysis.cpp:300, please report a bug to PyTorch. We don't have an op for aten::uniform but it isn't a special case. (analyzeImpl at ..\torch\csrc\jit\passes\alias_analysis.cpp:300)
(no backtrace available)

torch version is 1.4.0+cpu onnx verson is 1.8.0

Can you give me some advice？Thank you in advance.

@wenhaotang It still works fine for me - could you perhaps give it a try on jupyter notebook?