microsoft / nnfusion

A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.
MIT License
952 stars 158 forks source link

[BUG] import Pad op #386

Open jlxue opened 2 years ago

jlxue commented 2 years ago

🐛 Bug

shape check fail when importing Pad op To Reproduce Steps to reproduce the behavior:

  1. ./src/tools/nnfusion/nnfusion ~/.cache/torch/hub/checkpoints/densenet169.float32.onnx -f onnx
  2. [INFO] 2022-02-24T08:13:17z src/nnfusion/frontend/onnx_import/util/graph_convert.cpp 519 convert node: Pad_43 [WARNING] 2022-02-24T08:13:17z src/nnfusion/frontend/onnx_import/util/graph_convert.cpp 529 No translator for Pad, try to convert to generic op [ERROR] 2022-02-24T08:13:17z src/nnfusion/util/errors.hpp 169 Check failed: '2 == curr->get_input_size()' at /home/jxue/repo/nnfusion-jlxue/src/nnfusion/core/operators/generic_op/generic_op_define/Pad.cpp:62: (no explanation given) terminate called after throwing an instance of 'nnfusion::errors::CheckError' what(): Check failed: '2 == curr->get_input_size()' at /home/jxue/repo/nnfusion-jlxue/src/nnfusion/core/operators/generic_op/generic_op_define/Pad.cpp:62: (no explanation given) Aborted (core dumped) Expected behavior

Additional context

LeiWang1999 commented 2 years ago

now we have TranslatePadOp, but it doesn't seem to work on densenet161,

[INFO] 2022-05-14T05:29:52z src/nnfusion/frontend/onnx_import/util/graph_convert.cpp 519    convert node: Pad_43
[libprotobuf FATAL /usr/include/google/protobuf/repeated_field.h:1506] CHECK failed: (index) < (current_size_): 
terminate called after throwing an instance of 'google::protobuf::FatalException'
  what():  CHECK failed: (index) < (current_size_): 

Pad Op should have at least two input, but the input_size of this pad node is just 1 .

https://github.com/microsoft/nnfusion/blob/c3cd7155303e3af2f56c9f9bf0778656c372bc14/src/nnfusion/frontend/onnx_import/op/pad.hpp#L22-L23

image
LeiWang1999 commented 2 years ago

From my perspective, this problem is caused by the op set version, because since opset 11, 'pads' and 'value' have been moved from attributes to inputs, and currently the TranslatePadOp of nnfusion onnx frontend parses onnx opset as 11, but the opset version of superbench to generate the model is 9. I'm trying to fix it now.

But, I got an issue, what's the meaning of

nnfusion::Shape padding_below(paddings.size() / 2);
nnfusion::Shape padding_above(paddings.size() / 2);
nnfusion::Shape padding_interior(paddings.size() / 2);
{"pad_lower_h", _op->get_padding_below()[0]},
{"pad_lower_w", _op->get_padding_below()[1]},
{"pad_upper_h", _op->get_padding_above()[0]},
{"pad_upper_w", _op->get_padding_above()[1]},

because in onnx format, we have nh,nw,ch,cw,hh,hw,wh,ww, four value stored.

Please CC, @mzmssg @xysmlx

LeiWang1999 commented 2 years ago

densenet161-fp32.onnx can work now, but I need to know the right correspondence of padding and pads, currently they are mapped as :

  /*
   * fix: the correspondence of padding to pads is wrong
   * */
  for (int i = 0; i < 4; ++i)
  {
      padding_below[i] = padAttr.ints(i);
      padding_above[i] = padAttr.ints(i+4);
  }

what padAttr.ints means:

    int pad_0_h; // n
    int pad_0_w;
    int pad_1_h; // c
    int pad_1_w;
    int pad_2_h; // h
    int pad_2_w;
    int pad_3_h; // w
    int pad_3_w;
    float value; // pad value

        pad_param->pad_0_h = data_pad[0];
        pad_param->pad_0_w = data_pad[4];
        pad_param->pad_1_h = data_pad[1];
        pad_param->pad_1_w = data_pad[5];
        pad_param->pad_2_h = data_pad[2];
        pad_param->pad_2_w = data_pad[6];
        pad_param->pad_3_h = data_pad[3];
        pad_param->pad_3_w = data_pad[7];