ultralytics / yolov5

YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
49.21k stars 16.03k forks source link

Replacing Hardswish operator before conversion to ONNX might be broken #1382

Closed trane293 closed 3 years ago

trane293 commented 3 years ago

❔Question

In export.py file, the script performs the following operation in order to convert nn.Hardswish operator to the custom class present in utils.activations

# Update model
for k, m in model.named_modules():
    m._non_persistent_buffers_set = set()  # pytorch 1.6.0 compatibility
    if isinstance(m, models.common.Conv) and isinstance(m.act, nn.Hardswish):
        m.act = Hardswish()  # assign activation

However, this doesn't seem to work as expected when the model is a fused model, containing multiple Sequential() models, or even Ensemble() objects. The above snippet does not go into the model recursively to find all instances of nn.Hardwish and change it to the custom version. Doing so lead to the this issue: https://github.com/ultralytics/yolov5/issues/831

I have a fix for the problem, which seems to be working for me. I wrote a recursive function to go through the model and find all instances of Hardswish, and replace with the custom version. The implementation is here:

def set_hardswish_recursive(model):
    for layer in model.children():
        if isinstance(layer, nn.Sequential) or isinstance(layer, Ensemble) or isinstance(layer, yolo.Model) or list(layer.children()) != []: # if sequential layer, apply recursively to layers in sequential layer
            set_hardswish_recursive(layer)
        if list(layer.children()) == [] or type(layer) is Conv: # if leaf node, add it to list
            layer._non_persistent_buffers_set = set()
            if type(layer) in [nn.Hardswish, nn.LeakyReLU, nn.ReLU, nn.ReLU6]:
                layer.inplace = True  # pytorch 1.7.0 compatibility
            elif type(layer) is Conv:
                if isinstance(layer.act, nn.Hardswish):
                    layer.act = Hardswish()

After running this, the model exports fine, and all Hardswish operators are replaced correctly.

github-actions[bot] commented 3 years ago

Hello @trane293, thank you for your interest in our work! Please visit our Custom Training Tutorial to get started, and see our Jupyter Notebook Open In Colab, Docker Image, and Google Cloud Quickstart Guide for example environments.

If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom model or data training question, please note Ultralytics does not provide free personal support. As a leader in vision ML and AI, we do offer professional consulting, from simple expert advice up to delivery of fully customized, end-to-end production solutions for our clients, such as:

For more information please visit https://www.ultralytics.com.

glenn-jocher commented 3 years ago

@trane293 I show 51 out of 51 nn.Hardswish() activations replaced as designed.

    # Update model
    for k, m in model.named_modules():
        m._non_persistent_buffers_set = set()  # pytorch 1.6.0 compatibility
        if isinstance(m, models.common.Conv) and isinstance(m.act, nn.Hardswish):
            m.act = Hardswish()  # assign activation

print(len([x for x in model.modules() if type(x) is Hardswish]))
print(len([x for x in model.modules() if type(x) is nn.Hardswish]))
...
> 51
> 0

If I further print(model) I see zero nn.Hardswish() present:

print(model)

Model(
  (model): Sequential(
    (0): Focus(
      (conv): Conv(
        (conv): Conv2d(12, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (act): Hardswish()
      )
    )
    (1): Conv(
      (conv): Conv2d(32, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
      (act): Hardswish()
    )
    (2): BottleneckCSP(
      (cv1): Conv(
        (conv): Conv2d(64, 32, kernel_size=(1, 1), stride=(1, 1))
        (act): Hardswish()
      )
      (cv2): Conv2d(64, 32, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (cv3): Conv2d(32, 32, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (cv4): Conv(
        (conv): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1))
        (act): Hardswish()
      )
      (bn): BatchNorm2d(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
      (act): LeakyReLU(negative_slope=0.1, inplace=True)
      (m): Sequential(
        (0): Bottleneck(
          (cv1): Conv(
            (conv): Conv2d(32, 32, kernel_size=(1, 1), stride=(1, 1))
            (act): Hardswish()
          )
          (cv2): Conv(
            (conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (act): Hardswish()
          )
        )
      )
    )
    (3): Conv(
      (conv): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
      (act): Hardswish()
    )
    (4): BottleneckCSP(
      (cv1): Conv(
        (conv): Conv2d(128, 64, kernel_size=(1, 1), stride=(1, 1))
        (act): Hardswish()
      )
      (cv2): Conv2d(128, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (cv3): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (cv4): Conv(
        (conv): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1))
        (act): Hardswish()
      )
      (bn): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
      (act): LeakyReLU(negative_slope=0.1, inplace=True)
      (m): Sequential(
        (0): Bottleneck(
          (cv1): Conv(
            (conv): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1))
            (act): Hardswish()
          )
          (cv2): Conv(
            (conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (act): Hardswish()
          )
        )
        (1): Bottleneck(
          (cv1): Conv(
            (conv): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1))
            (act): Hardswish()
          )
          (cv2): Conv(
            (conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (act): Hardswish()
          )
        )
        (2): Bottleneck(
          (cv1): Conv(
            (conv): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1))
            (act): Hardswish()
          )
          (cv2): Conv(
            (conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (act): Hardswish()
          )
        )
      )
    )
    (5): Conv(
      (conv): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
      (act): Hardswish()
    )
    (6): BottleneckCSP(
      (cv1): Conv(
        (conv): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1))
        (act): Hardswish()
      )
      (cv2): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (cv3): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (cv4): Conv(
        (conv): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
        (act): Hardswish()
      )
      (bn): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
      (act): LeakyReLU(negative_slope=0.1, inplace=True)
      (m): Sequential(
        (0): Bottleneck(
          (cv1): Conv(
            (conv): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1))
            (act): Hardswish()
          )
          (cv2): Conv(
            (conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (act): Hardswish()
          )
        )
        (1): Bottleneck(
          (cv1): Conv(
            (conv): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1))
            (act): Hardswish()
          )
          (cv2): Conv(
            (conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (act): Hardswish()
          )
        )
        (2): Bottleneck(
          (cv1): Conv(
            (conv): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1))
            (act): Hardswish()
          )
          (cv2): Conv(
            (conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (act): Hardswish()
          )
        )
      )
    )
    (7): Conv(
      (conv): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
      (act): Hardswish()
    )
    (8): SPP(
      (cv1): Conv(
        (conv): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
        (act): Hardswish()
      )
      (cv2): Conv(
        (conv): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1))
        (act): Hardswish()
      )
      (m): ModuleList(
        (0): MaxPool2d(kernel_size=5, stride=1, padding=2, dilation=1, ceil_mode=False)
        (1): MaxPool2d(kernel_size=9, stride=1, padding=4, dilation=1, ceil_mode=False)
        (2): MaxPool2d(kernel_size=13, stride=1, padding=6, dilation=1, ceil_mode=False)
      )
    )
    (9): BottleneckCSP(
      (cv1): Conv(
        (conv): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
        (act): Hardswish()
      )
      (cv2): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (cv3): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (cv4): Conv(
        (conv): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1))
        (act): Hardswish()
      )
      (bn): BatchNorm2d(512, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
      (act): LeakyReLU(negative_slope=0.1, inplace=True)
      (m): Sequential(
        (0): Bottleneck(
          (cv1): Conv(
            (conv): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
            (act): Hardswish()
          )
          (cv2): Conv(
            (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (act): Hardswish()
          )
        )
      )
    )
    (10): Conv(
      (conv): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
      (act): Hardswish()
    )
    (11): Upsample(scale_factor=2.0, mode=nearest)
    (12): Concat()
    (13): BottleneckCSP(
      (cv1): Conv(
        (conv): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1))
        (act): Hardswish()
      )
      (cv2): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (cv3): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (cv4): Conv(
        (conv): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
        (act): Hardswish()
      )
      (bn): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
      (act): LeakyReLU(negative_slope=0.1, inplace=True)
      (m): Sequential(
        (0): Bottleneck(
          (cv1): Conv(
            (conv): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1))
            (act): Hardswish()
          )
          (cv2): Conv(
            (conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (act): Hardswish()
          )
        )
      )
    )
    (14): Conv(
      (conv): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1))
      (act): Hardswish()
    )
    (15): Upsample(scale_factor=2.0, mode=nearest)
    (16): Concat()
    (17): BottleneckCSP(
      (cv1): Conv(
        (conv): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1))
        (act): Hardswish()
      )
      (cv2): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (cv3): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (cv4): Conv(
        (conv): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1))
        (act): Hardswish()
      )
      (bn): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
      (act): LeakyReLU(negative_slope=0.1, inplace=True)
      (m): Sequential(
        (0): Bottleneck(
          (cv1): Conv(
            (conv): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1))
            (act): Hardswish()
          )
          (cv2): Conv(
            (conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (act): Hardswish()
          )
        )
      )
    )
    (18): Conv(
      (conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
      (act): Hardswish()
    )
    (19): Concat()
    (20): BottleneckCSP(
      (cv1): Conv(
        (conv): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1))
        (act): Hardswish()
      )
      (cv2): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (cv3): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (cv4): Conv(
        (conv): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
        (act): Hardswish()
      )
      (bn): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
      (act): LeakyReLU(negative_slope=0.1, inplace=True)
      (m): Sequential(
        (0): Bottleneck(
          (cv1): Conv(
            (conv): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1))
            (act): Hardswish()
          )
          (cv2): Conv(
            (conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (act): Hardswish()
          )
        )
      )
    )
    (21): Conv(
      (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
      (act): Hardswish()
    )
    (22): Concat()
    (23): BottleneckCSP(
      (cv1): Conv(
        (conv): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
        (act): Hardswish()
      )
      (cv2): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (cv3): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (cv4): Conv(
        (conv): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1))
        (act): Hardswish()
      )
      (bn): BatchNorm2d(512, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
      (act): LeakyReLU(negative_slope=0.1, inplace=True)
      (m): Sequential(
        (0): Bottleneck(
          (cv1): Conv(
            (conv): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
            (act): Hardswish()
          )
          (cv2): Conv(
            (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (act): Hardswish()
          )
        )
      )
    )
    (24): Detect(
      (m): ModuleList(
        (0): Conv2d(128, 255, kernel_size=(1, 1), stride=(1, 1))
        (1): Conv2d(256, 255, kernel_size=(1, 1), stride=(1, 1))
        (2): Conv2d(512, 255, kernel_size=(1, 1), stride=(1, 1))
      )
    )
  )
)
trane293 commented 3 years ago

Here's my model:

print(self.model)
Model(
  (model): Sequential(
    (0): Focus(
      (conv): Conv(
        (conv): Conv2d(12, 80, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (act): Hardswish()
      )
    )
    (1): Conv(
      (conv): Conv2d(80, 160, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
      (act): Hardswish()
    )
    (2): BottleneckCSP(
      (cv1): Conv(
        (conv): Conv2d(160, 80, kernel_size=(1, 1), stride=(1, 1))
        (act): Hardswish()
      )
      (cv2): Conv2d(160, 80, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (cv3): Conv2d(80, 80, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (cv4): Conv(
        (conv): Conv2d(160, 160, kernel_size=(1, 1), stride=(1, 1))
        (act): Hardswish()
      )
      (bn): BatchNorm2d(160, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
      (act): LeakyReLU(negative_slope=0.1, inplace=True)
      (m): Sequential(
        (0): Bottleneck(
          (cv1): Conv(
            (conv): Conv2d(80, 80, kernel_size=(1, 1), stride=(1, 1))
            (act): Hardswish()
          )
          (cv2): Conv(
            (conv): Conv2d(80, 80, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (act): Hardswish()
          )
        )
        (1): Bottleneck(
          (cv1): Conv(
            (conv): Conv2d(80, 80, kernel_size=(1, 1), stride=(1, 1))
            (act): Hardswish()
          )
          (cv2): Conv(
            (conv): Conv2d(80, 80, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (act): Hardswish()
          )
        )
        (2): Bottleneck(
          (cv1): Conv(
            (conv): Conv2d(80, 80, kernel_size=(1, 1), stride=(1, 1))
            (act): Hardswish()
          )
          (cv2): Conv(
            (conv): Conv2d(80, 80, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (act): Hardswish()
          )
        )
        (3): Bottleneck(
          (cv1): Conv(
            (conv): Conv2d(80, 80, kernel_size=(1, 1), stride=(1, 1))
            (act): Hardswish()
          )
          (cv2): Conv(
            (conv): Conv2d(80, 80, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (act): Hardswish()
          )
        )
      )
    )
    (3): Conv(
      (conv): Conv2d(160, 320, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
      (act): Hardswish()
    )
    (4): BottleneckCSP(
      (cv1): Conv(
        (conv): Conv2d(320, 160, kernel_size=(1, 1), stride=(1, 1))
        (act): Hardswish()
      )
      (cv2): Conv2d(320, 160, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (cv3): Conv2d(160, 160, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (cv4): Conv(
        (conv): Conv2d(320, 320, kernel_size=(1, 1), stride=(1, 1))
        (act): Hardswish()
      )
      (bn): BatchNorm2d(320, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
      (act): LeakyReLU(negative_slope=0.1, inplace=True)
      (m): Sequential(
        (0): Bottleneck(
          (cv1): Conv(
            (conv): Conv2d(160, 160, kernel_size=(1, 1), stride=(1, 1))
            (act): Hardswish()
          )
          (cv2): Conv(
            (conv): Conv2d(160, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (act): Hardswish()
          )
        )
        (1): Bottleneck(
          (cv1): Conv(
            (conv): Conv2d(160, 160, kernel_size=(1, 1), stride=(1, 1))
            (act): Hardswish()
          )
          (cv2): Conv(
            (conv): Conv2d(160, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (act): Hardswish()
          )
        )
        (2): Bottleneck(
          (cv1): Conv(
            (conv): Conv2d(160, 160, kernel_size=(1, 1), stride=(1, 1))
            (act): Hardswish()
          )
          (cv2): Conv(
            (conv): Conv2d(160, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (act): Hardswish()
          )
        )
        (3): Bottleneck(
          (cv1): Conv(
            (conv): Conv2d(160, 160, kernel_size=(1, 1), stride=(1, 1))
            (act): Hardswish()
          )
          (cv2): Conv(
            (conv): Conv2d(160, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (act): Hardswish()
          )
        )
        (4): Bottleneck(
          (cv1): Conv(
            (conv): Conv2d(160, 160, kernel_size=(1, 1), stride=(1, 1))
            (act): Hardswish()
          )
          (cv2): Conv(
            (conv): Conv2d(160, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (act): Hardswish()
          )
        )
        (5): Bottleneck(
          (cv1): Conv(
            (conv): Conv2d(160, 160, kernel_size=(1, 1), stride=(1, 1))
            (act): Hardswish()
          )
          (cv2): Conv(
            (conv): Conv2d(160, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (act): Hardswish()
          )
        )
        (6): Bottleneck(
          (cv1): Conv(
            (conv): Conv2d(160, 160, kernel_size=(1, 1), stride=(1, 1))
            (act): Hardswish()
          )
          (cv2): Conv(
            (conv): Conv2d(160, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (act): Hardswish()
          )
        )
        (7): Bottleneck(
          (cv1): Conv(
            (conv): Conv2d(160, 160, kernel_size=(1, 1), stride=(1, 1))
            (act): Hardswish()
          )
          (cv2): Conv(
            (conv): Conv2d(160, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (act): Hardswish()
          )
        )
        (8): Bottleneck(
          (cv1): Conv(
            (conv): Conv2d(160, 160, kernel_size=(1, 1), stride=(1, 1))
            (act): Hardswish()
          )
          (cv2): Conv(
            (conv): Conv2d(160, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (act): Hardswish()
          )
        )
        (9): Bottleneck(
          (cv1): Conv(
            (conv): Conv2d(160, 160, kernel_size=(1, 1), stride=(1, 1))
            (act): Hardswish()
          )
          (cv2): Conv(
            (conv): Conv2d(160, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (act): Hardswish()
          )
        )
        (10): Bottleneck(
          (cv1): Conv(
            (conv): Conv2d(160, 160, kernel_size=(1, 1), stride=(1, 1))
            (act): Hardswish()
          )
          (cv2): Conv(
            (conv): Conv2d(160, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (act): Hardswish()
          )
        )
        (11): Bottleneck(
          (cv1): Conv(
            (conv): Conv2d(160, 160, kernel_size=(1, 1), stride=(1, 1))
            (act): Hardswish()
          )
          (cv2): Conv(
            (conv): Conv2d(160, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (act): Hardswish()
          )
        )
      )
    )
    (5): Conv(
      (conv): Conv2d(320, 640, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
      (act): Hardswish()
    )
    (6): BottleneckCSP(
      (cv1): Conv(
        (conv): Conv2d(640, 320, kernel_size=(1, 1), stride=(1, 1))
        (act): Hardswish()
      )
      (cv2): Conv2d(640, 320, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (cv3): Conv2d(320, 320, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (cv4): Conv(
        (conv): Conv2d(640, 640, kernel_size=(1, 1), stride=(1, 1))
        (act): Hardswish()
      )
      (bn): BatchNorm2d(640, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
      (act): LeakyReLU(negative_slope=0.1, inplace=True)
      (m): Sequential(
        (0): Bottleneck(
          (cv1): Conv(
            (conv): Conv2d(320, 320, kernel_size=(1, 1), stride=(1, 1))
            (act): Hardswish()
          )
          (cv2): Conv(
            (conv): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (act): Hardswish()
          )
        )
        (1): Bottleneck(
          (cv1): Conv(
            (conv): Conv2d(320, 320, kernel_size=(1, 1), stride=(1, 1))
            (act): Hardswish()
          )
          (cv2): Conv(
            (conv): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (act): Hardswish()
          )
        )
        (2): Bottleneck(
          (cv1): Conv(
            (conv): Conv2d(320, 320, kernel_size=(1, 1), stride=(1, 1))
            (act): Hardswish()
          )
          (cv2): Conv(
            (conv): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (act): Hardswish()
          )
        )
        (3): Bottleneck(
          (cv1): Conv(
            (conv): Conv2d(320, 320, kernel_size=(1, 1), stride=(1, 1))
            (act): Hardswish()
          )
          (cv2): Conv(
            (conv): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (act): Hardswish()
          )
        )
        (4): Bottleneck(
          (cv1): Conv(
            (conv): Conv2d(320, 320, kernel_size=(1, 1), stride=(1, 1))
            (act): Hardswish()
          )
          (cv2): Conv(
            (conv): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (act): Hardswish()
          )
        )
        (5): Bottleneck(
          (cv1): Conv(
            (conv): Conv2d(320, 320, kernel_size=(1, 1), stride=(1, 1))
            (act): Hardswish()
          )
          (cv2): Conv(
            (conv): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (act): Hardswish()
          )
        )
        (6): Bottleneck(
          (cv1): Conv(
            (conv): Conv2d(320, 320, kernel_size=(1, 1), stride=(1, 1))
            (act): Hardswish()
          )
          (cv2): Conv(
            (conv): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (act): Hardswish()
          )
        )
        (7): Bottleneck(
          (cv1): Conv(
            (conv): Conv2d(320, 320, kernel_size=(1, 1), stride=(1, 1))
            (act): Hardswish()
          )
          (cv2): Conv(
            (conv): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (act): Hardswish()
          )
        )
        (8): Bottleneck(
          (cv1): Conv(
            (conv): Conv2d(320, 320, kernel_size=(1, 1), stride=(1, 1))
            (act): Hardswish()
          )
          (cv2): Conv(
            (conv): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (act): Hardswish()
          )
        )
        (9): Bottleneck(
          (cv1): Conv(
            (conv): Conv2d(320, 320, kernel_size=(1, 1), stride=(1, 1))
            (act): Hardswish()
          )
          (cv2): Conv(
            (conv): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (act): Hardswish()
          )
        )
        (10): Bottleneck(
          (cv1): Conv(
            (conv): Conv2d(320, 320, kernel_size=(1, 1), stride=(1, 1))
            (act): Hardswish()
          )
          (cv2): Conv(
            (conv): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (act): Hardswish()
          )
        )
        (11): Bottleneck(
          (cv1): Conv(
            (conv): Conv2d(320, 320, kernel_size=(1, 1), stride=(1, 1))
            (act): Hardswish()
          )
          (cv2): Conv(
            (conv): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (act): Hardswish()
          )
        )
      )
    )
    (7): Conv(
      (conv): Conv2d(640, 1280, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
      (act): Hardswish()
    )
    (8): SPP(
      (cv1): Conv(
        (conv): Conv2d(1280, 640, kernel_size=(1, 1), stride=(1, 1))
        (act): Hardswish()
      )
      (cv2): Conv(
        (conv): Conv2d(2560, 1280, kernel_size=(1, 1), stride=(1, 1))
        (act): Hardswish()
      )
      (m): ModuleList(
        (0): MaxPool2d(kernel_size=5, stride=1, padding=2, dilation=1, ceil_mode=False)
        (1): MaxPool2d(kernel_size=9, stride=1, padding=4, dilation=1, ceil_mode=False)
        (2): MaxPool2d(kernel_size=13, stride=1, padding=6, dilation=1, ceil_mode=False)
      )
    )
    (9): BottleneckCSP(
      (cv1): Conv(
        (conv): Conv2d(1280, 640, kernel_size=(1, 1), stride=(1, 1))
        (act): Hardswish()
      )
      (cv2): Conv2d(1280, 640, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (cv3): Conv2d(640, 640, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (cv4): Conv(
        (conv): Conv2d(1280, 1280, kernel_size=(1, 1), stride=(1, 1))
        (act): Hardswish()
      )
      (bn): BatchNorm2d(1280, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
      (act): LeakyReLU(negative_slope=0.1, inplace=True)
      (m): Sequential(
        (0): Bottleneck(
          (cv1): Conv(
            (conv): Conv2d(640, 640, kernel_size=(1, 1), stride=(1, 1))
            (act): Hardswish()
          )
          (cv2): Conv(
            (conv): Conv2d(640, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (act): Hardswish()
          )
        )
        (1): Bottleneck(
          (cv1): Conv(
            (conv): Conv2d(640, 640, kernel_size=(1, 1), stride=(1, 1))
            (act): Hardswish()
          )
          (cv2): Conv(
            (conv): Conv2d(640, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (act): Hardswish()
          )
        )
        (2): Bottleneck(
          (cv1): Conv(
            (conv): Conv2d(640, 640, kernel_size=(1, 1), stride=(1, 1))
            (act): Hardswish()
          )
          (cv2): Conv(
            (conv): Conv2d(640, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (act): Hardswish()
          )
        )
        (3): Bottleneck(
          (cv1): Conv(
            (conv): Conv2d(640, 640, kernel_size=(1, 1), stride=(1, 1))
            (act): Hardswish()
          )
          (cv2): Conv(
            (conv): Conv2d(640, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (act): Hardswish()
          )
        )
      )
    )
    (10): Conv(
      (conv): Conv2d(1280, 640, kernel_size=(1, 1), stride=(1, 1))
      (act): Hardswish()
    )
    (11): Upsample(scale_factor=2.0, mode=nearest)
    (12): Concat()
    (13): BottleneckCSP(
      (cv1): Conv(
        (conv): Conv2d(1280, 320, kernel_size=(1, 1), stride=(1, 1))
        (act): Hardswish()
      )
      (cv2): Conv2d(1280, 320, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (cv3): Conv2d(320, 320, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (cv4): Conv(
        (conv): Conv2d(640, 640, kernel_size=(1, 1), stride=(1, 1))
        (act): Hardswish()
      )
      (bn): BatchNorm2d(640, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
      (act): LeakyReLU(negative_slope=0.1, inplace=True)
      (m): Sequential(
        (0): Bottleneck(
          (cv1): Conv(
            (conv): Conv2d(320, 320, kernel_size=(1, 1), stride=(1, 1))
            (act): Hardswish()
          )
          (cv2): Conv(
            (conv): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (act): Hardswish()
          )
        )
        (1): Bottleneck(
          (cv1): Conv(
            (conv): Conv2d(320, 320, kernel_size=(1, 1), stride=(1, 1))
            (act): Hardswish()
          )
          (cv2): Conv(
            (conv): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (act): Hardswish()
          )
        )
        (2): Bottleneck(
          (cv1): Conv(
            (conv): Conv2d(320, 320, kernel_size=(1, 1), stride=(1, 1))
            (act): Hardswish()
          )
          (cv2): Conv(
            (conv): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (act): Hardswish()
          )
        )
        (3): Bottleneck(
          (cv1): Conv(
            (conv): Conv2d(320, 320, kernel_size=(1, 1), stride=(1, 1))
            (act): Hardswish()
          )
          (cv2): Conv(
            (conv): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (act): Hardswish()
          )
        )
      )
    )
    (14): Conv(
      (conv): Conv2d(640, 320, kernel_size=(1, 1), stride=(1, 1))
      (act): Hardswish()
    )
    (15): Upsample(scale_factor=2.0, mode=nearest)
    (16): Concat()
    (17): BottleneckCSP(
      (cv1): Conv(
        (conv): Conv2d(640, 160, kernel_size=(1, 1), stride=(1, 1))
        (act): Hardswish()
      )
      (cv2): Conv2d(640, 160, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (cv3): Conv2d(160, 160, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (cv4): Conv(
        (conv): Conv2d(320, 320, kernel_size=(1, 1), stride=(1, 1))
        (act): Hardswish()
      )
      (bn): BatchNorm2d(320, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
      (act): LeakyReLU(negative_slope=0.1, inplace=True)
      (m): Sequential(
        (0): Bottleneck(
          (cv1): Conv(
            (conv): Conv2d(160, 160, kernel_size=(1, 1), stride=(1, 1))
            (act): Hardswish()
          )
          (cv2): Conv(
            (conv): Conv2d(160, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (act): Hardswish()
          )
        )
        (1): Bottleneck(
          (cv1): Conv(
            (conv): Conv2d(160, 160, kernel_size=(1, 1), stride=(1, 1))
            (act): Hardswish()
          )
          (cv2): Conv(
            (conv): Conv2d(160, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (act): Hardswish()
          )
        )
        (2): Bottleneck(
          (cv1): Conv(
            (conv): Conv2d(160, 160, kernel_size=(1, 1), stride=(1, 1))
            (act): Hardswish()
          )
          (cv2): Conv(
            (conv): Conv2d(160, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (act): Hardswish()
          )
        )
        (3): Bottleneck(
          (cv1): Conv(
            (conv): Conv2d(160, 160, kernel_size=(1, 1), stride=(1, 1))
            (act): Hardswish()
          )
          (cv2): Conv(
            (conv): Conv2d(160, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (act): Hardswish()
          )
        )
      )
    )
    (18): Conv(
      (conv): Conv2d(320, 320, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
      (act): Hardswish()
    )
    (19): Concat()
    (20): BottleneckCSP(
      (cv1): Conv(
        (conv): Conv2d(640, 320, kernel_size=(1, 1), stride=(1, 1))
        (act): Hardswish()
      )
      (cv2): Conv2d(640, 320, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (cv3): Conv2d(320, 320, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (cv4): Conv(
        (conv): Conv2d(640, 640, kernel_size=(1, 1), stride=(1, 1))
        (act): Hardswish()
      )
      (bn): BatchNorm2d(640, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
      (act): LeakyReLU(negative_slope=0.1, inplace=True)
      (m): Sequential(
        (0): Bottleneck(
          (cv1): Conv(
            (conv): Conv2d(320, 320, kernel_size=(1, 1), stride=(1, 1))
            (act): Hardswish()
          )
          (cv2): Conv(
            (conv): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (act): Hardswish()
          )
        )
        (1): Bottleneck(
          (cv1): Conv(
            (conv): Conv2d(320, 320, kernel_size=(1, 1), stride=(1, 1))
            (act): Hardswish()
          )
          (cv2): Conv(
            (conv): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (act): Hardswish()
          )
        )
        (2): Bottleneck(
          (cv1): Conv(
            (conv): Conv2d(320, 320, kernel_size=(1, 1), stride=(1, 1))
            (act): Hardswish()
          )
          (cv2): Conv(
            (conv): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (act): Hardswish()
          )
        )
        (3): Bottleneck(
          (cv1): Conv(
            (conv): Conv2d(320, 320, kernel_size=(1, 1), stride=(1, 1))
            (act): Hardswish()
          )
          (cv2): Conv(
            (conv): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (act): Hardswish()
          )
        )
      )
    )
    (21): Conv(
      (conv): Conv2d(640, 640, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
      (act): Hardswish()
    )
    (22): Concat()
    (23): BottleneckCSP(
      (cv1): Conv(
        (conv): Conv2d(1280, 640, kernel_size=(1, 1), stride=(1, 1))
        (act): Hardswish()
      )
      (cv2): Conv2d(1280, 640, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (cv3): Conv2d(640, 640, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (cv4): Conv(
        (conv): Conv2d(1280, 1280, kernel_size=(1, 1), stride=(1, 1))
        (act): Hardswish()
      )
      (bn): BatchNorm2d(1280, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
      (act): LeakyReLU(negative_slope=0.1, inplace=True)
      (m): Sequential(
        (0): Bottleneck(
          (cv1): Conv(
            (conv): Conv2d(640, 640, kernel_size=(1, 1), stride=(1, 1))
            (act): Hardswish()
          )
          (cv2): Conv(
            (conv): Conv2d(640, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (act): Hardswish()
          )
        )
        (1): Bottleneck(
          (cv1): Conv(
            (conv): Conv2d(640, 640, kernel_size=(1, 1), stride=(1, 1))
            (act): Hardswish()
          )
          (cv2): Conv(
            (conv): Conv2d(640, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (act): Hardswish()
          )
        )
        (2): Bottleneck(
          (cv1): Conv(
            (conv): Conv2d(640, 640, kernel_size=(1, 1), stride=(1, 1))
            (act): Hardswish()
          )
          (cv2): Conv(
            (conv): Conv2d(640, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (act): Hardswish()
          )
        )
        (3): Bottleneck(
          (cv1): Conv(
            (conv): Conv2d(640, 640, kernel_size=(1, 1), stride=(1, 1))
            (act): Hardswish()
          )
          (cv2): Conv(
            (conv): Conv2d(640, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (act): Hardswish()
          )
        )
      )
    )
    (24): Detect(
      (m): ModuleList(
        (0): Conv2d(320, 42, kernel_size=(1, 1), stride=(1, 1))
        (1): Conv2d(640, 42, kernel_size=(1, 1), stride=(1, 1))
        (2): Conv2d(1280, 42, kernel_size=(1, 1), stride=(1, 1))
      )
    )
  )
)

Using the snippet:

for k, m in self.model.named_modules():
        m._non_persistent_buffers_set = set()  # pytorch 1.6.0 compatibility
        if isinstance(m, Conv) and isinstance(m.act, nn.Hardswish):
            m.act = Hardswish()  # assign activation

I put a breakpoint at m.Hardswish(), and it never got hit.

Here's my output for for the lines that you mentioned to check for Hardswish operators:

print(len([x for x in self.model.modules() if type(x) is Hardswish]))
print(len([x for x in self.model.modules() if type(x) is nn.Hardswish]))
>0
>123
glenn-jocher commented 3 years ago

@trane293 well then the problem is with your code or your environment. You've likely developed a workaround to a bug you inadvertently introduced yourself, or perhaps your code is simply out of date. I'll paste you our default bug response for these sorts of situations.

Hello, thank you for your interest in our work! This issue seems to lack the minimum requirements for a proper response, or is insufficiently detailed for us to help you. Please note that most technical problems are due to:

CODE TO REPRODUCE YOUR ISSUE HERE


- **Your custom data.** If your issue is not reproducible in one of our 3 common datasets ([COCO](https://github.com/ultralytics/yolov5/blob/master/data/coco.yaml), [COCO128](https://github.com/ultralytics/yolov5/blob/master/data/coco128.yaml), or [VOC](https://github.com/ultralytics/yolov5/blob/master/data/voc.yaml)) we can not debug it. Visit our [Custom Training Tutorial](https://docs.ultralytics.com/yolov5/tutorials/train_custom_data) for guidelines on training your custom data. Examine `train_batch0.jpg` and `test_batch0.jpg` for a sanity check of your labels and images.

- **Your environment.** If your issue is not reproducible in one of the verified environments below we can not debug it. If you are running YOLOv5 locally, verify your environment meets all of the [requirements.txt](https://github.com/ultralytics/yolov5/blob/master/requirements.txt) dependencies specified below. If in doubt, download Python 3.8.0 from https://www.python.org/, create a new [venv](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/), and then install requirements.

If none of these apply to you, we suggest you close this issue and raise a new one using the **Bug Report template**, providing screenshots and **minimum viable code to reproduce your issue**. Thank you!

## Requirements

Python 3.8 or later with all [requirements.txt](https://github.com/ultralytics/yolov5/blob/master/requirements.txt) dependencies installed, including `torch>=1.6`. To install run:
```bash
$ pip install -r requirements.txt

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

CI CPU testing

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are passing. These tests evaluate proper operation of basic YOLOv5 functionality, including training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu.

glenn-jocher commented 3 years ago

Here is a breakpoint during export. m.act type shows as the default nn.Hardswish() using current master. Everything operates correctly.

Screenshot 2020-11-13 at 00 35 24
trane293 commented 3 years ago

I see. let me get together a minimal working example to reproduce this, which should be good exercise for me to find out whether this was something related to my codebase or with the framework. Thanks for the responses!

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.