About

Handwritten digits detection using a YOLOv8 detection model and ONNX pre/post processing. An example of how model works in real world scenario can be viewed at https://thawro.github.io/web-object-detector/.

Data

The dataset consists of images created with the use of a HWD+ dataset.

HWD+

The HWD+ dataset consists of gray images of single handwritten digits in high resolution (500x500 pixels).

yolo_HWD+

The yolo_HWD+ dataset is composed of images which are produced with the use of HWD+ dataset. Each yolo_HWD+ image has many single digits on one image and each digit is properly annotated (class x_center y_center width height). The processing of HWD+ to obtain yolo_HWD+:

Cut the digit from each image (HWD+ images have a lot of white background around)
Create background image of size imgsz and apply transform to it (pre_transform attribute) - e.g. RGB shift/shuffle
Take nrows * ncols digit images and form a nrows x ncols grid.
For each digit:
1. Apply transform (obj_transform attribute) - e.g. invert color, RGB shift/shuffle
2. Randomly place the digit in ij cell and save its label and location as annotation.
Apply transform to the fully formed grid (post_transform attribute) - e.g. rotation

Example below:

Raw digits (before any processing)

raw

Cut digits (after step 1)

cut

Formed grid (left) and with annotations (right)

yolo_example

Tech stack

PyTorch - neural networks architectures and datasets classes
ONNX - All processing steps used in pipeline
ONNX Runtime - Pipeline inference
OpenCV - Image processing for the server-side model inference (optional)
React - Web application used to test object detection models in real world examples

Pipeline

Each pipeline step is done with ONNX models. The complete pipeline during inference is the following:

Image preprocessing - resize and pad to match model input size (preprocessing)
Object detection - Detect objects with YOLOv8 model (yolo)
Non Maximum Supression - Apply NMS to YOLO output (nms)
Postprocessing - Apply postprocessing to filtered boxes (postprocessing)

Model's results

Image

image_prediction

Video

video_prediction

thawro / yolov8-digits-detection

readme