microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.66k stars 2.93k forks source link

cast BatchNorm2d to int32 #10440

Open shabashaash opened 2 years ago

shabashaash commented 2 years ago

Describe the bug So Im not completely sure I should write this issue here or on pytorch rep, but here we go.

I have several BatchNorm2d layers without some additional params. As I know the webgl runtime for onnxruntime-web doesnt support int64. But tracked parametrs for BatchNorm2d are saved in int64 (because of possible overflow of int32). I tried multiple ways to convert this several params to int32 or even float32 (I know its bad, but I just tried) and they stayed the same. Maybe its because of original implementation of BatchNorm which doesnt allow to change the type.

So as mentioned here (https://github.com/pytorch/pytorch/issues/14807) there are multiple ways to solve this.

What have I tried so far: 1.Method like here (https://github.com/ultralytics/yolov5/issues/250) using setattr() 2.1 By adding .to(torch.int32) - which returned "nn.Module.to only accepts floating point or complex dtypes, but got desired dtype=torch.int32" which is understandable. 2.2 By adding .to(torch.float32) - which doesnt change anything, params stayed INT64. Example of second approach:

self.up3 = nn.Sequential(
            nn.Upsample(scale_factor=2, mode='bilinear'),
            nn.Conv2d(512, 256, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(256).to(torch.float32), activation
        )
  1. I tried this script (https://github.com/aadhithya/onnx-typecast) - no luck Params in INT64:
    ['first_layer.2.num_batches_tracked', 'down0.1.num_batches_tracked', 'down1.1.num_batches_tracked', 'down2.1.num_batches_tracked', 'down3.1.num_batches_tracked', 'up3.2.num_batches_tracked', 'up2.2.num_batches_tracked', 'up1.2.num_batches_tracked', 'up0.2.num_batches_tracked']

    Urgency None

System information Windows 10 GTX1060 3GB Browser Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:84.0) Gekko/20100101 Firefox/84.0 Context name: webgl GL version: WebGL 1.0 Shading language version: WebGL GLSL ES 1.0 JS "onnxruntime-web": "^1.11.0-dev.20220124-3dfadf903" (got same error on stable 1.10 release too) Python torch.version 1.10.0+cu111 onnx.version 1.10.2

Screenshots Example of how the onnx exports the model with INT64: sample

Code to load the model in JS:

const session = await ort.InferenceSession.create('./true_visual_512_opset13_int32.onnx', 
 {
   executionProviders: ['webgl'],
   providers:['CUDAExecutionProvider'],
   graphOptimizationLevel: 'all'
 }
);

Code to export the model in Python:

with torch.no_grad():
    torch.onnx.export(model,                                            
                      args=tuple([img_att, latend_id_c]),                     
                      f=export_model_path,                              
                      opset_version=13,                                
                      do_constant_folding=False,                         
                      input_names=['target_image', 
                                   'source_latent_id'],
                      output_names=['fake_img']                    
        )

UPD: Also when im trying to run model on onnx-runtime (Python) which was converted with onnx-typecast script, Im getting this error: “[ONNXRuntimeError] : 10 : INVALID_GRAPH : Load model from /content/onnx_models/true_visual_512_opset13_nc_float32cast_int32.onnx failed:This is an invalid model. Type Error: Type ‘tensor(int32)’ of input parameter (219) of operator (Unsqueeze) in node (Unsqueeze_76) is invalid.”

And with same model on onnxruntime-web(nodeJS) with webgl (works well on wasm, but too long): “failed to inference ONNX model: Error: unrecognized input ‘’ for node: Resize_1242.”. I suppose its because some of params are not converted to INT32 properly.

Link (https://drive.google.com/drive/folders/1exI5G4KLJhcAUI2WVSfPj_9-gsdvyj17?usp=sharing) to folder on google drive with 2 models: true_visual_512_opset13_nc_float32cast.onnx - just added .to(‘float32’) true_visual_512_opset13_nc_float32cast_int32.onnx - added .to(‘float32’) and also used onnx-typecast script.

MrKhozyin commented 2 years ago

OMG SO BASED ISSUE