Vit model with preprocessing steps failed on inference

arseniymerkulov commented 7 months ago

I am trying to use PrePostProcessor on vit-tiny model in .onnx format with this code:

  onnx_opset = 17 
  model_path = '...'
  model = onnx.load(model_path)
  inputs = [create_named_value('image', onnx.TensorProto.UINT8, [3, 'h', 'w'])]

  pipeline = PrePostProcessor(inputs, onnx_opset)

  pipeline.add_pre_processing(
      [
          Resize(224, layout='CHW'),
          ImageBytesToFloat(), 
          Normalize([(0.5, 0.5), (0.5, 0.5), (0.5, 0.5)]),
          Unsqueeze([0]), 
      ]
  )

  new_model = pipeline.run(model)
  onnx.checker.check_model(new_model)

  output_path = model_path.replace('.onnx', '.with.preprocessing.onnx')
  onnx.save_model(new_model, output_path)

After saving it looks ok to me in Netron and i checked it with onnx.checker. Original model have fixed input/output shape: [1, 3, 224, 224] -> [1, 28]

When i run model with preprocessing with code below i get an error:

onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running Add node. Name:'Add_107' Status Message: D:\a\_work\1\s\onnxruntime\core/providers/cpu/math/element_wise_ops.h:560 onnxruntime::BroadcastIterator::Append axis == 1 || axis == largest was false. Attempting to broadcast an axis by a dimension other than 1. 197 by 477

Inference code:

  image = Image.open("...")
  inputs = transforms.PILToTensor()(image)
  inputs = inputs.numpy()
  outputs = self.session.run(None, {'image': inputs})[0]

Attaching model in .onnx format with and without preprocessing model.with.preprocessing.onnx.zip model.zip

skottmckay commented 7 months ago

What size in the input image? If it's not square, you'll also need a centered crop. Otherwise the Resize is going to result in uneven sides (one side will be 224 but the other won't) and break.

    pipeline.add_pre_processing(
        [
            Resize(256, layout='CHW'),
            Transpose([1, 2, 0]),  # CHW to HWC. We can look at adding ChannelsFirstToChannelsLast to simplify
            CenterCrop(224, 224),  # must be HWC currently. We can look at adding CHW support.
            ChannelsLastToChannelsFirst(),
            ImageBytesToFloat(),
            Normalize([(0.5, 0.5), (0.5, 0.5), (0.5, 0.5)]),
            Unsqueeze([0]),
        ]

Would also be good to update the original model to opset 17 for consistency. Do that prior to adding the pre-processing.

python -m onnxruntime.tools.update_onnx_opset --opset 17 model.onnx model.opset17.onnx

arseniymerkulov commented 7 months ago

Yes, input images can be rectangles, thank you. This set of preprocessing steps results in downfall of accuracy from 0.97 to 0.85 compared to torch transforms:

transforms.Compose([
            transforms.Resize(size=(224, 224),
                              interpolation=transforms.InterpolationMode.BILINEAR,
                              max_size=None,
                              antialias=None),
            transforms.ConvertImageDtype(torch.float),
            transforms.Normalize(mean=torch.Tensor([0.5000, 0.5000, 0.5000]),
                                 std=torch.Tensor([0.5000, 0.5000, 0.5000]))
        ])

After i replaced crop step with letterbox, accuracy increased to 0.92:

 pipeline.add_pre_processing(
            [
                Resize((224, 224), policy='not_larger', layout='CHW'),
                LetterBox(target_shape=(224, 224), layout='CHW'),
                ImageBytesToFloat(),
                Normalize([(0.5, 0.5), (0.5, 0.5), (0.5, 0.5)]),
                Unsqueeze([0]),
            ]
        )

Accuracy still below original, i guess this is because torch Resize does not keep same aspect ratio for an image. I didnt find ways to achieve that in onnxruntime-extensions preprocessing steps. What is your advice on that? As an extreme case i can change preprocessing steps in finetuning to match preprocessing steps in inference

skottmckay commented 7 months ago

You need to match the preprocessing that was done when the model was trained. Whether to crop or letterbox depends on the model type.

e.g. for image classification it will tend to crop like here

For something like object detection or OCR you'd letterbox so you're not excluding any of the original image.

Typically in the pre-processing you'd maintain the aspect ratio, which is what the onnxruntime-extensions preprocessing steps support currently. If you randomly stretch it's going to be harder to train/match.

The difference could be antialiasing. Based on this is sounds like the parameter is ignored for PIL images.

The ONNX Resize supports antialiasing from opset 18 on, so you may need to a) update your model to opset 18, and b) use opset 18 when adding the pre-processing.

arseniymerkulov commented 7 months ago

Thank you for your answer

microsoft / onnxruntime-extensions

Vit model with preprocessing steps failed on inference #646