The reason why am I making TFLiteSwift-Vision

Goal

Make a vision-specific layer that you can use on a TFLiteSwift application. You can use pre-implemented vision-specific functions like following:

various image preprocessing methods (resizing, bwhc sequence, grayscale, normalization...)
postprocessing examples of various vision tasks (classification...)

picture 1. Data flow during using TFLiteSwift-Vision

I don't know this implementation can merge into tensorflow/tflite-support or tensorflow/examples. But I'll maintain this framework for my personal needs first, and then I'll check this repo can be used or merged into the tensorflow's repos.

Motivation

There are a lot of general methods for image preprocessing when you use them in vision problems, and I would like other iOS developers can use the methods without the need to implement the general methods. In TFLiteSwift-Vision, I made implementation for abstracting and generalizing the image pre-processing as a first step, after then I'm going to make image post-processing and post-processing examples of task-specific cases. So I expect other researchers and developers can use the TFLite without re-implementation of the functions and achieve goals faster.

Why TFLiteSwift-Vision instead of MLKit?

picture 2. supporting tasks of MLkit's custom model (captured at 21.08.23 from here)

You can consider domain specific features in MLKit, but those are supporting image classification and object detection now and you cannot use them when you want to implement other tasks like segmentation, pose estimation, style transfer, etc. (If there are other methods that I don't know, please comment!)

picture 3. The architecture of MLKit and CoreML

As you can see on the right side of picture 3, Apple supports the image pre/post-processing layer through Vision framework. I expect TFLiteSwift-Vision will be able to be a similar role, I want to support not only tensorflow model's pre/post-processing, but also the tflite model which is converted from pytorch.

picture 4. TFLiteSwift-Vision's position in iOS TFLite architecture

How about tflite's the task-library?

As you can see in the picture 3, TFLite officially supports task-library which is the bunch of implementation of various domains pre/post-processing. But it was made by c language (ref), it could be a huddle for most iOS developers who are familiar with Swift in the customization aspect. For making more iOS developers leverage vision tflite model, we support the implementation of split things into pre-processing and post-processing parts and make Swift implementation.

In TFLiteSwift-Vision, we mainly support pre-processing part. Because there is a lot of research and applications for vision tasks that can receive the image as an input. But the case of model output is image is limited as GAN like tasks, so image output post-processing feature will be released after 1.0.0 version as a goal. Now I have made the basic feature that the framework returns Tensor.

picture 5. supporting tasks of official TFLite task library (captured at 21.08.23 from here)

Feature Works

[x] 0.1.0 − basic implementation
- TFLiteSwift-Vision: converting into Data
- TFLiteSwift-Vision: normalization -- scale to 0.0...1.0, normalize with mean and std, do not normalize
- TFLiteSwift-Vision: resizing and cropping
- Example: inference mobilenet image classification
[x] 0.2.0
- TFLiteSwift-Vision: make more simple interface when use it
- https://github.com/tucan9389/TFLiteSwift-Vision/issues/5
- TFLiteSwift-Vision: support UInt8 on one of input type
[x] 0.2.1
- TFLiteSwift-Vision: support arm64 simulator
- https://github.com/tucan9389/TFLiteSwift-Vision/issues/13
[x] 0.2.2
- Example: inference pose estimation
- Example: inference mnist gray input model
[ ] 0.3.0
- Example: support camera and realtime example
[ ] 1.0.0 − video
after 1.0.0
- test code for validating with academic metric

tucan9389 / TFLiteSwift-Vision