microsoft / onnxruntime-inference-examples

Examples for using ONNX Runtime for machine learning inferencing.
MIT License
1.16k stars 332 forks source link

MNIST C++ Example with Input from File #70

Open dilne opened 2 years ago

dilne commented 2 years ago

Is your feature request related to a problem? Please describe. Many data scientists and AI engineers are experienced in using TensorFlow or PyTorch in the Python language and want to port their models to C++ for inference. However, many are inexperienced in C++. The MNIST example is good, but it is frustrating that there is not an example that takes an image from a file as input. It is much more likely that someone will want to input an image rather than draw an image and the code will be much shorter and understandable to newcomers.

Describe the solution you'd like Please can someone provide a new example based on the MNIST C++ example that demonstrates a prediction being made on an image loaded from the disk, as opposed to drawing an image, potentially using OpenCV.

Many thanks

snnn commented 2 years ago

I know Windows APIs better than OpenCV. I wrote one based on Windows APIs: https://github.com/microsoft/onnxruntime-inference-examples/blob/main/c_cxx/imagenet/image_loader_wic.cc#L37. The file has example of how to read a file into a RGB buffer.

Then you will need to convert the RGB values to gray values, like what is showed in https://github.com/onnx/models/tree/master/vision/classification/mnist

import numpy as np
import cv2

image = cv2.imread('input.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
gray = cv2.resize(gray, (28,28)).astype(np.float32)/255
input = np.reshape(gray, (1,1,28,28)
dilne commented 2 years ago

Thanks for your response. How does the C++ code you've linked to demonstrate the C++ equivalent of input = np.reshape(gray, (1,1,28,28))? To know how to alter the dimensions of the input image to match the dimensions of the input layer would be really useful.

snnn commented 2 years ago

I think you are saying image resizing, which doesn't apply the mnist dataset. mnist dataset images are all in fixed size. So all the models do not define how to do image resizing. (While imagenet models must define it) You can use your own way to do image resizing, by using whatever algorithm.
Like: https://github.com/microsoft/onnxruntime-inference-examples/blob/main/c_cxx/imagenet/image_loader.cc#L62

Or

#include <shcore.h>
#include <wincodec.h>
#include <wincodecsdk.h>

#include <filesystem>
#include <vector>

#include "onnxruntime_c_api.h"
#include "wil/com.h"

/**
 *  Read the file from `input_file` and auto-scale it to 720x720
 * \param out should be freed by caller after use
 * \param output_count Array length of the `out` param
 */
int read_image_file(_In_z_ const ORTCHAR_T* input_file, _Out_ size_t* height, _Out_ size_t* width, _Outptr_ float** out,
                  _Out_ size_t* output_count) {
  wil::com_ptr_failfast<IWICImagingFactory> piFactory =
      wil::CoCreateInstanceFailFast<IWICImagingFactory>(CLSID_WICImagingFactory);
  wil::com_ptr_failfast<IWICBitmapDecoder> decoder;
  FAIL_FAST_IF_FAILED(
      piFactory->CreateDecoderFromFilename(input_file, NULL, GENERIC_READ,
                                           WICDecodeMetadataCacheOnDemand,  // defer parsing non-critical metadata
                                           &decoder));
  UINT count = 0;
  FAIL_FAST_IF_FAILED(decoder->GetFrameCount(&count));
  if (count != 1) {
    printf("The image has multiple frames, I don't know which to choose.\n");
    abort();
  }
  wil::com_ptr_failfast<IWICBitmapFrameDecode> piFrameDecode;
  FAIL_FAST_IF_FAILED(decoder->GetFrame(0, &piFrameDecode));
  UINT image_width, image_height;
  FAIL_FAST_IF_FAILED(piFrameDecode->GetSize(&image_width, &image_height));
  wil::com_ptr_failfast<IWICBitmapScaler> scaler;
  IWICBitmapSource* source_to_copy = piFrameDecode.get();
  if (image_width != 720 || image_height != 720) {    
    FAIL_FAST_IF_FAILED(piFactory->CreateBitmapScaler(&scaler)); //This is the resizing you are asking
    FAIL_FAST_IF_FAILED(scaler->Initialize(source_to_copy, 720, 720, WICBitmapInterpolationModeFant));
    source_to_copy = scaler.get();
    image_width = 720;
    image_height = 720;
  }
  wil::com_ptr_failfast<IWICFormatConverter> ppIFormatConverter;
  FAIL_FAST_IF_FAILED(piFactory->CreateFormatConverter(&ppIFormatConverter));
  FAIL_FAST_IF_FAILED(ppIFormatConverter->Initialize(source_to_copy, GUID_WICPixelFormat24bppRGB,
                                                     WICBitmapDitherTypeNone, NULL, 0.f, WICBitmapPaletteTypeCustom));
  // output format is 24bpp, which means 24 bits per pixel
  constexpr UINT bytes_per_pixel = 24 / 8;
  UINT stride = image_width * bytes_per_pixel;
  std::vector<uint8_t> data(image_width * image_height * bytes_per_pixel);
  FAIL_FAST_IF_FAILED(ppIFormatConverter->CopyPixels(nullptr, stride, static_cast<UINT>(data.size()), data.data()));
  *height = 720;
  *width = 720;
  return 0;
}
coderjoonwoo commented 2 years ago

In MNIST.cpp, there are some guidelines without code examples: // After instantiation, set the inputimage data to be the 28x28 pixel image of the number to recognize ... // std::array<float, width * height> inputimage{};

I am sorry I don't understand. What should I do to build input image for inferrence ?

ps. Does 'model.onnx' files under ./js/* folder work for MNIST?

venki-thiyag commented 2 years ago

@coderjoonwoo Looks like model needs to be download from https://github.com/onnx/models/tree/main/vision/classification/mnist Not sure why this was not mentioned anywhere in documentation.