mapillary / inplace_abn

In-Place Activated BatchNorm for Memory-Optimized Training of DNNs
BSD 3-Clause "New" or "Revised" License
1.32k stars 187 forks source link

RuntimeError: ONNX export failed: Couldn't export Python operator InPlaceABNSync #210

Open jxncyym opened 3 years ago

jxncyym commented 3 years ago

Do you know how to slove this problem: RuntimeError: ONNX export failed: Couldn't export Python operator InPlaceABNSync

wolterlw commented 3 years ago

because InPlaceABNSync is a custom operation implemented in CUDA - it cannot be exported to ONNX. To export the model, however, you can perform conv-bn fusion if your architecture allows it. Check this issue for more information. I have the same problems currently, but fusing conv and bn using the function provided in the issue comments does not produce the same results. But I'm sure it can be done with a little more effort than copy-and-pasting the provided code.

Update: I was able to replace InPlaceABNSync with regular BatchNorm2d + activation. It's not super elegant, but I created a new ModelForExport class where I replaced all InPlaceABNSync blocks with nn.Sequential(BatchNorm2d, activation). Then I loaded the weights into those modules according to the readme: torch.abs(weight) + eps (eps being 1e-5). That required some tempering with the original state_dict, but all worked out. Also don't forget to call .eval() on both of your models to get similar results.

AGenchev commented 2 years ago

I wonder whether I did it right by defining conv2d_ABN as:

def conv2d_ABN(ni, nf, stride, activation="leaky_relu", kernel_size=3, activation_param=1e-2, groups=1):
    if activation=="leaky_relu":
        return nn.Sequential(
            nn.Conv2d(ni, nf, kernel_size=kernel_size, stride=stride, padding=kernel_size // 2, groups=groups,
            bias=False),
            nn.BatchNorm2d(nf),    # InPlaceABN(num_features=nf, activation=activation, activation_param=activation_param)
            nn.LeakyReLU(negative_slope=activation_param,inplace=True)
        )
    if activation=="identity":
        return nn.Sequential(
            nn.Conv2d(ni, nf, kernel_size=kernel_size, stride=stride, padding=kernel_size // 2, groups=groups,
            bias=False),
            nn.BatchNorm2d(nf),    # InPlaceABN(num_features=nf, activation=activation, activation_param=activation_param)
            nn.Identity()
        )
    else:
        print("Error: unknown activation")

the training is likely working... Reason: I'm using anaconda venv where the cuda abn fails to compile. My goal is to compare TresNetS to Effnetv2_s. It seems that EfficientNetv2 uses more GPU memory and TresNet returns the favor by using (reportedly) more MADD ops. Needs benchmarking.