thawro / yolov8-digits-detection

Digits detection with YOLOv8 detection model and ONNX pre/post processing
https://thawro.github.io/web-object-detector/
12 stars 4 forks source link
computer-vision deep-learning digits-detection digits-recognition object-detection onnx

About

Handwritten digits detection using a YOLOv8 detection model and ONNX pre/post processing. An example of how model works in real world scenario can be viewed at https://thawro.github.io/web-object-detector/.

Data

The dataset consists of images created with the use of a HWD+ dataset.

HWD+

The HWD+ dataset consists of gray images of single handwritten digits in high resolution (500x500 pixels).

yolo_HWD+

The yolo_HWD+ dataset is composed of images which are produced with the use of HWD+ dataset. Each yolo_HWD+ image has many single digits on one image and each digit is properly annotated (class x_center y_center width height). The processing of HWD+ to obtain yolo_HWD+:

  1. Cut the digit from each image (HWD+ images have a lot of white background around)
  2. Create background image of size imgsz and apply transform to it (pre_transform attribute) - e.g. RGB shift/shuffle
  3. Take nrows * ncols digit images and form a nrows x ncols grid.
  4. For each digit:
    1. Apply transform (obj_transform attribute) - e.g. invert color, RGB shift/shuffle
    2. Randomly place the digit in ij cell and save its label and location as annotation.
  5. Apply transform to the fully formed grid (post_transform attribute) - e.g. rotation

Example below:

Raw digits (before any processing)

raw

Cut digits (after step 1)

cut

Formed grid (left) and with annotations (right)

yolo_example

Tech stack

Pipeline

Each pipeline step is done with ONNX models. The complete pipeline during inference is the following:

  1. Image preprocessing - resize and pad to match model input size (preprocessing)
  2. Object detection - Detect objects with YOLOv8 model (yolo)
  3. Non Maximum Supression - Apply NMS to YOLO output (nms)
  4. Postprocessing - Apply postprocessing to filtered boxes (postprocessing)

Model's results

Image

image_prediction

Video

video_prediction